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Preface 



Over the past two decades, the Foundations of Software Technology and Theo- 
retical Computer Science (FSTTCS) conferences have been providing an an- 
nual forum in India for the presentation and publication of results in computer 
science from around the world. This volume contains the proceedings of the 23rd 
FSTTCS, organized under the aegis of the Indian Association for Research in 
Computing Science (lARCS). 

FSTTCS 2003 attracted over 160 submissions from 29 countries. After obtai- 
ning 521 referee reports within a period of one month, the programme committee 
accepted 33 contributed papers, the maximum that could fit into a two-and-half- 
day programme. Unfortunately, many good papers had to be turned away. We 
thank all the authors for submitting their papers to FSTTCS 2003. We thank the 
reviewers for the tremendous support they provided to the conference through 
their informed and thorough reviews of the papers. We sincerely thank the mem- 
bers of the programme committee for lending their names to the conference and 
for meeting the challenge arising out of the increased number of submissions this 
year. We are especially grateful to Kamal Lodaya who came down to Mumbai 
to assist us during the PC meeting. 

FSTTCS programmes have always featured highly eminent computer scien- 
tists as invited speakers. It is our great pleasure to thank the invited speakers 
of FSTTCS 2003, Randal Bryant, Moni Naor, Joseph Sifakis, Osamu Watan- 
abe and Avi Wigderson, who graciously agreed to speak at the conference and 
contribute to this volume. 

For several years now, topical workshops have been organized together with 
FSTTCS conferences. This year, the conference was preceded by a workshop on 
Advances in Model Checking, and was followed by a workshop on Algorithms for 
Processing Massive Data Sets. We thank the organizers and speakers for agreeing 
to come and share their expertise. 

The PC meeting was held electronically using software originally developed 
by V. Vinay. We thank our colleagues at the Tata Institute of Fundamental 
Research who came forward to help us, in particular, Vishwas Patil, who got the 
software up and kept the system running. We thank the Department of Computer 
Science, IIT Bombay, for hosting the conference. We thank Springer- Verlag for 
agreeing to publish the proceedings of this conference, and its editorial team for 
helping us bring out this volume. 
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A Cryptographically Sound Security Proof of the 
Needham-Schroeder-Lowe Public-Key Protocol* 



Michael Backes and Birgit Pfitzmann 

IBM Zurich Research Lab 
{mbc.bpf }@zurich. ibm. com 



Abstract. We prove the Needham-Schroeder-Lowe public-key protocol secure 
under real, active cryptographic attacks including concurrent protocol runs. This 
proof is based on an abstract cryptographic library, which is a provably secure 
abstraction of a real cryptographic library. Together with composition and integrity 
preservation theorems from the underlying model, this allows us to perform the 
actual proof effort in a deterministic setting corresponding to a slightly extended 
Dolev-Yao model. 

Our proof is one of the two first independent cryptographically sound security 
proofs of this protocol. 

It is the first protocol proof over an abstract Dolev-Yao-style library that is in the 
scope of formal proof tools and that automatically yields cryptographic soundness. 
We hope that it paves the way for the actual use of automatic proof tools for this 
and many similar cryptographically sound proofs of security protocols. 



1 Introduction 

In recent times, the analysis of cryptographic protocols has been getting more and more 
attention, and the demand for rigorous proofs of cryptographic protocols has been rising. 

One way to conduct such proofs is the cryptographic approach, whose security 
definitions are based on complexity theory, e.g., [13,12,14,7]. The security of a cryp- 
tographic protocol is proved by reduction, i.e., by showing that breaking the protocol 
implies breaking one of the underlying cryptographic primitives with respect to its cryp- 
tographic definition and thus finally a computational assumption such as the hardness 
of integer factoring. This approach captures a very comprehensive adversary model and 
allows for mathematically rigorous and precise proofs. However, because of probabilism 
and complexity-theoretic restrictions, these proofs have to be done by hand so far, which 
yields proofs with faults and imperfections. Moreover, such proofs rapidly become too 
complex for larger protocols. 

The alternative is the formal-methods approach, which is concerned with the automa- 
tion of proofs using model checkers and theorem provers. As these tools currently cannot 
deal with cryptographic details like error probabilities and computational restrictions, 
abstractions of cryptography are used. They are almost always based on the so-called 
Dolev-Yao model [11]. This model simplifies proofs of larger protocols considerably 

* An extended version of this paper is available as [5]. 



P.K. Pandya and J. Radhakrishnan (Eds.): FSTTCS 2003, LNCS 2914, pp. 1-12, 2003. 
© Springer- Verlag Berlin Heidelberg 2003 




2 



M. Backes and B. Pfitzmann 



and gave rise to a large body of literature on analyzing the security of protocols using 
various techniques for formal verification, e.g., [20,18,15,9,22,1]. 

A prominent example demonstrating the usefulness of the formal-methods approach 
is the work of Lowe [16], where he found a man-in-the-middle attack on the well-known 
Needham-Schroeder public-key protocol [21]. Lowe later proposed a repaired version 
of the protocol [17] and used the model checker FDR to prove that this modified protocol 
(henceforth known as the Needham-Schroeder-Lowe protocol) is secure in the Dolev- 
Yao model. The original and the repaired Needham-Schroeder public-key protocols are 
two of the most often investigated security protocols, e.g., [26,19,25,27]. Various new 
approaches and formal proof tools for the analysis of security protocols were validated 
by showing that they can discover the known flaw or prove the repaired protocol in the 
Dolev-Yao model. 

It is well-known and easy to show that the security flaw of the original protocol in 
the formal-methods approach can as well be used to mount a successful attack against 
any cryptographic implementation of the protocol. However, all prior security proofs of 
the repaired protocol are restricted to the Dolev-Yao model, i.e., no theorem exists that 
allows for carrying over the results of an existing proof to the cryptographic approach 
with its much more comprehensive adversary. Although recent research focused on 
moving towards such a theorem, i.e., a cryptographically sound foundation of the formal- 
methods approach, the results are either specific for passive adversaries [3,2] or they do 
not capture the local evaluation of nested cryptographic terms [ 1 0,24] , which is needed to 
model many usual cryptographic protocols. A recently proposed cryptographic library [6] 
allows for such nesting, but has not been applied to any security protocols yet. Thus, it 
is still an open problem to conduct a formal protocol proof that an actual cryptographic 
implementation is secure under active attacks with respect to cryptographic security 
definitions. 

We close this gap by providing a security proof of the Needham-Schroeder-Lowe 
public-key protocol in the cryptographic approach. Our proof is based on the cryp- 
tographic library from [6], which is abstract in the sense needed for theorem provers 
but nevertheless has a provably secure implementation. Together with composition and 
integrity preservation theorems from the underlying model, this allows us to perform 
the actual proof effort in a deterministic setting corresponding to a slightly extended 
Dolev-Yao model. 

Independently and concurrently to this work, another cryptographically sound proof 
of the Needham-Schroeder-Lowe public -key protocol has been invented in [28]. The 
proof is conducted from scratch in the cryptographic approach. It establishes a stronger 
security property. The benefit of our proof is that it is sufficient to prove the security of the 
Needham-Schroeder-Lowe protocol based on the deterministic abstractions offered by 
the cryptographic library; then the result automatically carries over to the cryptographic 
setting. As the proof is both deterministic and rigorous, it should be easily expressible in 
formal proof tools, in particular theorem provers. Even done by hand, our proof is much 
less prone to error than a reduction proof conducted from scratch in the cryptographic 
approach. 

We hope that our proof paves the way for the actual use of automatic proof tools for 
this and many similar cryptographically sound proofs of security protocols. In particular. 
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we are confident that stronger properties of the Needham-Schroeder-Lowe protocol can 
he proved in the same way, but this should become much simpler once the transition to 
automatic proof tools has been made based on this first, hand-proved example. 

2 Preliminaries 

In this section, we give an overview of the ideal cryptographic library of [6] and briefly 
sketch its provably secure implementation. We start by introducing notation. 

2.1 Notation 

We write for deterministic and for probabilistic assignment, and I is an 
error element added to the domains and ranges of all functions and algorithms. The 
list operation is denoted as I := (xi, . . . ,Xj), and the arguments are unambiguously 
retrievable as l[i], with l[i] = I if i > j. A database I? is a set of functions, called entries, 
each over a finite domain called attributes. For an entry x € D, the value at an attribute 
att is written x.att. For a predicate pred involving attributes, D[pred] means the subset 
of entries whose attributes fulfill pred. If D[pred] contains only one element, we use the 
same notation for this element. Adding an entry x to I? is abbreviated £> :<^= x. 

2.2 Overview of the Ideal and Real Cryptographic Library 

The ideal (abstract) cryptographic library of [6] offers its users abstract cryptographic op- 
erations, such as commands to encrypt or decrypt a message, to make or test a signature, 
and to generate a nonce. All these commands have a simple, deterministic semantics. To 
allow a reactive scenario, this semantics is based on state, e.g., of who already knows 
which terms; the state is represented as a database. Each entry has a type (e.g., “cipher- 
text”), and pointers to its arguments (e.g., a key and a message). Further, each entry 
contains handles for those participants who already know it. A send operation makes 
an entry known to other participants, i.e., it adds handles to the entry. The ideal crypto- 
graphic library does not allow cheating. For instance, if it receives a command to encrypt 
a message m with a certain key, it simply makes an abstract database entry for the ci- 
phertext. Another user can only ask for decryption of this ciphertext if he has obtained 
handles to both the ciphertext and the secret key. 

To allow for the proof of cryptographic faithfulness, the library is based on a detailed 
model of asynchronous reactive systems introduced in [24] and represented as a deter- 
ministic machine TH("H), called trusted host. The parameter "H C {1 . . . , n} denotes 
the honest participants, where n is a parameter of the library denoting the overall num- 
ber of participants. Depending on the considered set TL, the trusted host offers slightly 
extended capabilities for the adversary. However, for current purposes, the trusted host 
can be seen as a slightly modified Dolev- Yao model together with a network and intruder 
model, similar to “the CSP Dolev- Yao model” or “the inductive-approach Dolev- Yao 
model”. 

The real cryptographic library offers its users the same commands as the ideal one, 
i.e., honest users operate on cryptographic objects via handles. The objects are now real 
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cryptographic keys, ciphertexts, etc., handled by real distributed machines. Sending a 
term on an insecure channel releases the actual bitstring to the adversary, who can do 
with it what he likes. The adversary can also insert arbitrary bitstrings on non-authentic 
channels. The implementation of the commands is based on arbitrary secure encryption 
and signature systems according to standard cryptographic definitions, e.g., adaptive 
chosen-ciphertext security in case of public -key encryption, with certain additions like 
type tagging and additional randomizations. 

The security proof of [6] states that the real library is at least as secure as the ideal 
library. This is captured using the notion of simulatability, which states that whatever 
an adversary can achieve in the real implementation, another adversary can achieve 
given the ideal library, or otherwise the underlying cryptography can be broken [24]. 
This is the strongest possible cryptographic relationship between a real and an ideal 
system. In particular it covers active attacks. Moreover, a composition theorem exists in 
the underlying model [24], which states that one can securely replace the ideal library 
in larger systems with the real library, i.e., without destroying the already established 
simulatability relation. 

3 The Needham-Schroeder-Lowe Public-Key Protocol 

The original Needham-Schroeder protocol and Lowe’s variant consist of seven steps, 
where four steps deal with key generation and public-key distribution. These steps are 
usually omitted in a security analysis, and it is simply assumed that keys have already 
been generated and distributed. We do this as well to keep the proof short. However, the 
underlying cryptographic library offers commands for modeling the remaining steps as 
well. The main part of the Needham-Schroeder-Lowe public-key protocol consists of 
the following three steps, expressed in the typical protocol notation as, e.g., in [16]. 

1. u ^ V : Epk^{Nu,u) 

2. V ^ u : Epk^{Nu,Ny,v) 

3. u V Ep}^^(^Ny), 

Here, user u seeks to establish a session with user v. He generates a nonce iV„ and sends 
it to V together with his identity, encrypted with v’s public key (first message). Upon 
receiving this message, v decrypts it to obtain the nonce iV„. Then v generates a new 
nonce iV„ and sends both nonces and her identity back to u, encrypted with u’s public 
key (second message). Upon receiving this message, u decrypts it and tests whether the 
contained identity v equals the sender of the message and whether u earlier sent the first 
contained nonce to user v. If yes, u sends the second nonce back to v, encrypted with 
w’s public key (third message). Finally, v decrypts this message; and if v had earlier sent 
the contained nonce to u, then v believes that she spoke with u. 

3.1 The Needham-Schroeder-Lowe Protocol Using the Abstract Library 

We now show how to model the Needham-Schroeder-Lowe protocol in the framework 
of [24] and using the ideal cryptographic library. For each user u G {1, • ■ • , n}, we 
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Fig. 1. Overview of the Needham-Schroeder-Lowe Ideal System. 



define a machine called a protocol machine, which executes the protocol sketched 
above for participant identity u, allowing an arbitrary (polynomial) number of concurrent 
executions.' This machine is connected to its user via ports EA_out„!, EA.in^j? (“EA” 
for “Entity Authentication”, because the behavior at these ports is the same for all 
entity authentication protocols) and to the cryptographic library via ports in^j!, out„?. 
The notation follows the CSP convention, e.g., the cryptographic library has a port in„? 
where it obtains messages output at in „ ! . The combination of the protocol machines M 
and the trusted host TH("H) is the ideal Needham-Schroeder-Lowe system . It 

is shown in Eigure 1; H and A model the arbitrary joint honest users and the adversary, 
respectively. 

Using the notation of [24], the system Sys^^'"^ consists of several structures 
{M{TL), S{T-L)), one for each value of the parameter TL. Each structure consists of a 
set M (LI) := {TH("H)} U | u G Ti,} of machines, i.e., for a given set Li of honest 

users, only the machines with u G LL are actually present in a system run. The 
others are subsumed in the adversary. S (LL) denotes those ports of M (LL) that the hon- 
est users connect to, i.e., S(LL) := {EAJn„?, EA_outtj! | u G LL}. Eormally, we obtain 
_ {(M(Li), S(Li)) \Li<G ,n}}. 

In order to capture that keys have been generated and distributed, we assume that 
suitable entries for the keys already exist in the database of T H ( "H ) . We denote the handle 
of ui to the public key of u as and the handle of u to its secret key as sfcejj"''. 

The state of the machine Mj]'® consists of the bitstring u and a family 
(Nonceu,v)v£{i,... ,n} of sets of handles. Each set Nonceu,v is initially empty. We now 

* We could define local submachines per protocol execution. However, one also needs a dispatcher 
submachine for user u to dispatch incoming protocol messages to the submachines by the 
nonces, and user inputs to new submachines. We made such a construction once for a case with 
complicated submachines [23], and the dispatcher could be reused for all protocols with a fixed 
style of distinguishing protocol runs. However, the Needham-Schroeder-Lowe protocol does 
not use an existing dispatcher, and for such an almost stateless protocol splitting machines into 
submachines rather complicates the invariants. 
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define how the machine evaluates inputs. They either come from user u at port 
EA_in„? or from TH("H) at port out„?. The behavior of in both cases is described 
in Algorithm 1 and 2 respectively, which we will describe below. We refer to Step i of 
Algorithm j as Step j.i. Both algorithms should immediately abort if a command to 
the cryptographic library does not yield the desired result, e.g., if a decryption request 
fails. For readability we omit these abort checks in the algorithm descriptions; instead 
we impose the following convention. 

Convention 1 enters a command at port in^! and receives at port out„? as 

the immediate answer of the cryptographic library, then aborts the execution of the 
current algorithm, except if the command was of the form list_proj or sendJ. 

At any time, the user of the machine can start a new protocol execution with any 
userw G {I,... , n}\{u} by inputting (new_prot, n) atport EA_in„?. Our security proof 
holds for all adversaries and all honest users, i.e., especially those that start protocols with 
the adversary (respectively a malicious user) concurrently with protocols with honest 
users. Upon such an input, builds up the term corresponding to the first protocol 
message using the ideal cryptographic library TH("H) according to Algorithm 1. The 
command gen_nonce generates the ideal nonce. stores the resulting handle 
in Nonc6u,v for future comparison. The command store inputs arbitrary application 
data into the cryptographic library, here the user identity u. The command list forms a 
list and encrypt is encryption. Since only lists are allowed to be transferred in TH("H) 
(because the list-operation is a convenient place to concentrate all verifications that no 
secret items are put into messages), the encryption is packed as a list again. The final 
command send_i means that sends the resulting term to v over an insecure channel 
(called channel type i). The effect is that the adversary obtains a handle to the term and 
can decide what to do with it, e.g., forward it to M delay it compared with concurrent 
protocol executions, or modify it. 

The behavior of upon receiving an input from the cryptographic library at port 
out^j? corresponding to a message that arrives over the network is defined similarly in Al- 
gorithm 2. By construction of TH("H), such an input is always of the form {v,uf\, 
where v is the supposed sender, u the recipient, i the channel type “insecure” (the only 
type used here), and the handle to the received message, which is always a list. 

first decrypts the list content using the secret key of user u, which yields a handle 
^hnd inner list. This list is parsed into at most three components using the command 
list_proj. If the list has two elements, i.e., it could correspond to the first message of the 
protocol, generates a new nonce and stores its handle in Aonce„^t,. After that, 
builds up a new list according to the protocol description, encrypts the list and sends it to 
user V. If the list has three elements, i.e., it could correspond to the second message of the 
protocol, then M tests whether the third list element equals v and whether the first list 
element is already contained in the set NoncCu,v If one of these tests does not succeed, 
aborts. Otherwise, it again builds up a term according to the protocol description 
and sends it to user v. Finally, if the list has only one element, i.e., it could correspond to 
the third message of the protocol, then tests if the handle of this element is already 
contained in the set Nonceu,v If so, outputs (ok, v) at EA_out„!. This signals that 
the protocol with user v has terminated successfully, i.e., u believes that he spoke with 



V. 
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Algorithm 1 Evaluation of Inputs from the User (Protocol Start) 
Input: (new_prot, v) at EA_in„? with u G {1, . . . , n} \ {u}. 
gen_nonce(). 

Nonccu.v '■= Nonccu.v U 



'U,V • -‘■'I ^ 

store(u). 

c5''"' ■«- encrypt(pfce*'"'' 

m\'"^ *r 



thnd \ 
v-ui / 



iist(ci"‘') 
send_i(u, mi"''). 



3.2 On Polynomial Runtime 

In order to use existing composition results of the underlying model, the machines 
have to be polynomial-time. Similar to the cryptographic library, we hence define that 
each machine maintains explicit polynomial bounds on the message lengths and 
the number of inputs accepted at each port. 



4 Formalizing the Security Property 

The security property that we prove is entity authentication of the initiator. It states that 
an honest participant v only successfully terminates a protocol with an honest participant 
u if u has indeed started a protocol with v, i.e., an output (ok, u) at EA_out„! can only 
happen if there was a prior input (new _prot, v) at EA_in „?. This property and the protocol 
as defined above do not consider replay attacks. This can be added to the protocol as 
follows: If receives a message from v containing a nonce and created this 
nonce, then it additionally removes this nonce from the set Aonce„ i.e., after Steps 
2.20 and 2.25, the handle is removed from Nonceu,v^ 

Integrity properties in the underlying model are formally sets of traces at the in- 
and output ports connecting the system to the honest users, i.e., here traces at the port 
set S{TL) = {EA_out„!, EA_in^(? | u G TL}. Intuitively, such an integrity property Req 
states which are the “good” traces at these ports. A trace is a sequence of sets of events. 
We write an event plm or plm, meaning that message m occurs at input or output port 
p. The f-th step of a trace r is written rp, we also speak of the step at time t. Thus the 
integrity requirement of entity authentication of the initiator is formally defined 

as follows: 

Definition 1. (Entity Authentication Requirement) A trace r is contained in Req^^ if for 
all u,v € H: 

^ Proving freshness and two-sided authentication is certainly useful future work, in particular 
once the proof has been automated. We do not intend to prove the property of matching con- 
versation from [8]. It makes constraints on events within the system; this cannot be expressed 
in an approach based on abstraction and modularization. We see it as a sufficient, but not nec- 
essary condition for the desired abstract properties. Some additional properties associated with 
matching conversations only become meaningful at an abstract level if one goes beyond entity 
authentication to session establishment. 
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Algorithm 2 Evaluation of Inputs from TH {Ti) (Network Inputs) 
Input: (ii, u, i, at out„? with u G {1, . . . ,n}\ {u}. 



2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 : 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 
27 



^hnd 

^hnd 

^hnd 



list_proj(m''''^ 1) 



decrypt(sfce'i"'', o j 
x'i'“ list_proj(i^"‘^, i) for i = 1, 2, 3. 

if x\'"‘ / 4, A X 2 "’^ / f A X 3 '''' = 4 , then {First Message is input} 

X2 <— retrieve(a;2"‘^). 

if X2 7^ V then 
Abort 
end if 

gen_nonce(). 

Nonceu,v ■= Nonceu,v U {nj}''''}. 

<— store(u). 

Iist(a;5' 

encrypt(pfce:;^^,/5"'). 

- Iist(c5"^). 
send_i(u,m2"‘'). 

else if Xi’"^ / i A x'^'^ / f A * 3 "'' / 4 then (Second Message is input} 



12 ' 

hnd 

C2 

_hnd 

m2 






X3 retrieve}®!^'''’), 
if X3 V V *5"'' ^ Nonceu 
Abort 

end if 

cr ^encrypt(pfce:;^^,ir 

^ list(4"^). 
send_i(u, 
else if a;5"'' 



, then 



Output (ok, v) at EA_out 

end if 



Nonce,,..,: A x'^'^ = x''^'^ = X then (Third Message is input} 



3ti G N : EA_out„!(ok, u) G # If v believes she speaks with u at time t\ 

=J> 3fo < f 1 : # then there exists a past time to 

EA_in^?(new_prot, v) G r^g # in which u started a protocol with v 

The notion of a system Sys fulfilling an integrity property Req essentially comes in 
two flavors [4]. Perfect fulfillment, Sys Req, means that the integrity property 

holds for all traces arising in runs of Sys (a well-defined notion from the underlying 
model [24]). Computational fulfillment, Sys |=P°h Req, means that the property only 
holds for polynomially hounded users and adversaries, and only with negligible error 
probahility. Perfect fulfillment implies computational fulfillment. 

The following theorem captures the security of the ideal Needham-Schroeder-Lowe 
protocol. 

Theorem 1. ( Security of the Needham-Schroeder-Lowe Protocol based on the Ideal 
Cryptographic Library) Let Sys^^’"^ be the ideal Needham-Schroeder-Lowe system de- 
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fined in Section 3, and Req^^ the integrity property of Definition 1. Then Sys^^'''^ 
Req^^. 

5 Proof of the Cryptographic Realization 

If Theorem 1 has been proven, it follows that the Needham-Schroeder-Lowe protocol 
based on the real cryptographic library computationally fulfills the integrity requirement 
Req^^. The main tool is the following preservation theorem from [4]. 

Theorem 2. (Preservation of Integrity Properties (Sketch)) Let two systems Sysi, Sys 2 
be given such that Sys i is computationally at least as secure as Sys 2 ( written Sys ^ 

Sys 2 )- Let Req be an integrity requirement for both Sysi and Sys 2 , and let Sys 2 
Req. Then also Sys^ |=P°'y Req. 

Let Sys^’^’"^ and Sys^’^’’^^^ denote the ideal and the real cryptographic library from [6], 
and the Needham-Schroeder-Lowe protocol based on the real cryptographic 

library. This is well-defined given fhe formalization with the ideal library because the 
real library has the same user ports and offers the same commands. 

Theorem 3. (Security of the Real Needham-Schroeder-Lowe Protocol) Let Req^^ de- 
note the integrity property of Definition 1. Then |=p°iy Req^^. 

Proof In [6] it was shown that Sys'^^^"''^ holds for suitable parameters 

in the ideal system. Since Sys^^'’^^'' is derived from Sys^^''^ by replacing the ideal with 
the real cryptographic library, >sec^ Sys^^'"^ follows from the composition 

theorem of [24]. We only have to show that the theorem’s preconditions are fulfilled. 
This is straightforward, since the machines are polynomial-time (cf. Section 3.2). 
Now Theorem 1 implies ^p°b hence Theorem 2 yields |=p°iy 

Req^^. 

6 Proof in the Ideal Setting 

This section sketches the proof of Theorem 1, i.e., the proof of the Needham-Schroeder- 
Lowe protocol using the ideal, deterministic cryptographic library. A complete proof 
can be found in the long version of this paper [5], together with a short version of the 
notation of the cryptographic library. The proof idea is to go backwards in the protocol 
step by step, and to show that a specific oufpuf always requires a specific prior inpuf. 
For insfance, when user v successfully terminates a protocol with user u, then u has sent 
the third protocol message to w; thus v has sent the second protocol message to u; and 
so on. The main challenge in this proof was to find suitable invariants on the state of 
the ideal Needham-Schroeder-Lowe system. This is somewhat similar to formal proofs 
using the Dolev-Yao model; indeed the similarity supports our hope that the new, sound 
cryptographic library can be used in the place of the Dolev-Yao model in automated 
tools. 

The first invariants, correct nonce owner and unique nonce use, are easily proved 
and essentially state that handles contained in a set Nonctu.v indeed point to entries of 
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type nonce, and that no nonce is in two such sets. The next two invariants, nonce secrecy 
and nonce-list secrecy, deal with the secrecy of certain terms and are mainly needed to 
prove the last invariant, correct list owner, which establishes who created certain terms. 

Invariant! (Correct Nonce Owner) For all u G FL,v £ {I,-- - ,n} and for all G 

Nonc6u,v, we have D[hndu = ^ i and D[hndu = = nonce. 



Invariant 2 (Unique Nonce Use) For all u,v £ H, allw,w' € {1, . . . , n}, and all j < 
size: If D[j].hndu G Nonceu,w and D[j].hndy £ Noncey^w', then (u,w) = (u,w'). 

Nonce secrecy states that the nonces exchanged between honest users u and v remain 
secret from all other users and from the adversary. For the formalization, note that the 
handles to these nonces form the sets Nonceu,v The claim is that the other users and 
the adversary have no handles to such a nonce in the database Z? of TH("H): 

Invariant 3 (Nonce Secrecy) For all u,v £ H and for all j < size: If D[j].hndu £ 
Nonceu,v then D[j].hndw = ffor allw £ ("H U {a}) \ {u, u}. 

Similarly, the invariant nonce-list secrecy states that a list containing such a handle can 
only be known to u and v. Further, it states that the identity fields in such lists are correct. 
Moreover, if such a list is an argument of another entry, then this entry is an encryption 
with the public key of u or v. 

Invariant 4 (Nonce-List Secrecy) For all u,v £ % and for all j < size with 
D[j].type = list.- Let xf^ := D[j].arg[i] for i = 1,2,3. If D[xf"^].hndu £ Nonceu,v 
then 



- D[j].hndy, = ifor all w £ {FLU {a}) \ {tt, u}. 

- if D[x!f^i].type = data, then D[x)')^i].arg = (u). 

- for all k < size we have j £ D[k].arg only if D[k].type = enc and D[k].arg[l] £ 
{pkeu,pkeyj. 

The invariant correct list owner states that certain protocol messages can only be con- 
structed by the “intended” users. For example, if a database entry is structured like the 
cleartext of a first protocol message, i.e., it is of type list, its first argument belongs to 
the set Nonceu,v, and its second argument is a non-cryptographic construct (formally 
of type data) then it must have been created by user u. Similar statements exist for the 
second and third protocol message. 

Invariants (Correct List Owner) For all u,v £ FL and for all j < size with 
D[j].type = list.- Let := D[j].arg[i] andx'^'^^ := D[xf"^].hndufor i = 1,2. 

- If £ Nonceu,v and D[x 2 '^\.type = data, then D[j\ was created by in 
Step 1.4 (in one of its protocol executions). 

- If D[xf^].type = nonce and x^'^u £ Nonceu,v, then D[j] was created by in 
Step 2.12. 

- Ifxf'y £ Nonccu.v and xf"^ = f then D[j] was created by Mj)'® in Step 2.21. 
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This invariant is key for proceeding backwards in the protocol. For instance, if v termi- 
nates a protocol with user u, then v must have received a third protocol message. Correct 
list owner implies that this message has been generated by u. Now u only constructs 
such a message if it received a second protocol message. Applying the invariant two 
more times shows that u indeed started a protocol with v. 



Acknowledgments. We thank Michael Waidner and the anonymous reviewers for in- 
teresting comments. 



References 

1. M. Abadi and A. D. Gordon. A calculus for cryptographic protocols: The spi calculus. 
Information and Computation, 148(1): 1-70, 1999. 

2. M. Abadi and J. Jiirjens. Formal eavesdropping and its computational interpretation. In Proc. 
4th International Symposium on Theoretical Aspects of Computer Software (TACS), pages 
82-94, 2001. 

3. M. Abadi and P. Rogaway. Reconciling two views of cryptography (the computational sound- 
ness of formal encryption). Journal of Cryptology, 15(2): 103-127, 2002. 

4. M. Backes and C. Jacobi. Cryptographically sound and machine-assisted verification of secu- 
rity protocols. In Proc. 20th Annual Symposium on Theoretical Aspects of Computer Science 
(STACS), volume 2607 of Lecture Notes in Computer Science, pngos 675-686. Springer, 2003. 

5. M. Backes and B. Pfitzmann. A cryptographically sound security proof of the Needham- 
Schroeder-Lowe public-key protocol. lACR Cryptology ePrint Archive 2003/121, June 2003. 
http : // epr int . iacr . org/. 

6. M. Backes, B. Pfitzmann, and M. Waidner. A universally composable cryptographic library. 
IACR Cryptology ePrint Archive 2003/015, Jan. 2003. http : // eprint . iacr . org/. 

7. M. Bellare, A. Desai, D. Pointcheval, and P. Rogaway. Relations among notions of security 
for public-key encryption schemes. \n Advances in Cryptology: CRYPTO ’98, volume 1462 
of Lecture Notes in Computer Science, pages 26^5. Springer, 1998. 

8. M. Bellare and P. Rogaway. Entity authentication and key distribution. In Advances in 
Cryptology: CRYPTO ’93, volume 773 of Lecture Notes in Computer Science, pages 232- 
249. Springer, 1994. 

9. M. Burrows, M. Abadi, and R. Needham. A logic for authentication. Technical Report 39, 
SRC DIGITAL, 1990. 

10. R. Canetti. Universally composable security: A new paradigm for cryptographic protocols. In 
Proc. 42nd IEEE Symposium on Foundations of Computer Science (FOCS), pages 136-145, 
2001 . 

11. D. Dolev and A. C. Yao. On the security of public key protocols. IEEE Transactions on 
Information Theory, 29(2): 198-208, 1983. 

12. O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game -or- a completeness 
theorem for protocols with honest majority. In Proc. 1 9th Annual ACM Symposium on Theory 
of Computing (STOC), pages 218-229, 1987. 

13. S. Goldwasser and S. Micali. Probabilistic encryption. Journal of Computer and System 
Sciences, 28:270-299, 1984. 

14. S. Goldwasser, S. Micali, and C. Rackoff. The knowledge complexity of interactive proof 
systems. SIAM Journal on Computing, 18(1): 186-207, 1989. 

15. R. Kemmerer. Analyzing encryption protocols using formal verification techniques. IEEE 
Journal on Selected Areas in Communications, 7(4):448^57, 1989. 




12 



M. Backes and B. Pfitzmann 



16. G. Lowe. An attack on the Needham-Schroeder public-key authentication protocol. Infor- 
mation Processing Letters, 56(3):131~135, 1995. 

17. G. Lowe. Breaking and fixing the Needham-Schroeder public-key protocol using FDR. In 
Proc. 2nd International Conference on Tools and Algorithms for the Construction and Analysis 
of Systems (TACAS), volume 1055 of Lecture Notes in Computer Science, pages 147-166. 
Springer, 1996. 

18. C. Meadows. Using narrowing in the analysis of key management protocols. In Proc. lOth 
IEEE Symposium on Security & Privacy, pages 138-147, 1989. 

19. C. Meadows. Analyzing the Needham-Schroeder public key protocol: A comparison of two 
approaches. In Proc. 4th European Symposium on Research in Computer Security (ESORICS), 
volume 1 146 of Lecture Notes in Computer Science, pages 35 1-364. Springer, 1996. 

20. J. K. Millen. The interrogator: A tool for cryptographic protocol security. In Proc. 5th IEEE 
Symposium on Security & Privacy, pages 134-141, 1984. 

21. R. Needham and M. Schroeder. Using encryption for authentication in large networks of 
computers. Communications of the ACM, 12(21):993-999, 1978. 

22. L. Paulson. The inductive approach to verifying cryptographic protocols. Journal of Cryp- 
tology, 6(1):85-128, 1998. 

23. B. Pfitzmann, M. Schunter, and M. Waidner. Provably secure certified mail. Research Report 
RZ 3207, IBM Research, 2000. 

http : //www. Zurich. ibm. com/ security/publications/. 

24. B. Pfitzmann and M. Waidner. A model for asynchronous reactive systems and its application 
to secure message transmission. In Proc. 22nd IEEE Symposium on Security & Privacy, 
pages 184-200, 2001. 

25. S. Schneider. Verifying authentication protocols with CSP. In Proc. 10th IEEE Computer 
Security Eoundations Workshop (CSEW), pages 3-17, 1997. 

26. P. Syverson. A new look at an old protocol. Operation Systems Review, 30(3):!^, 1996. 

27. F. J. Thayer Fabrega, J. C. Herzog, and J. D. Guttman. Strand spaces: Why is a security 
protocol correct? In Proc. 19th IEEE Symposium on Security & Privacy, pages 160-171, 
1998. 

28. B. Warinschi. A computational analysis of the Needham-Schroeder-(Lowe) protocol. In Proc. 
16th IEEE Computer Security Foundations Workshop ( CSEW), pages 248-262, 2003. 




Constructions of Sparse Asymmetric Connectors 

Extended Abstract 



Andreas Baltz, Gerold Jager, and Anand Srivastav 



Mathematisches Seminar, Christian-Albrechts-Universitat zu Kiel, 
Christian- Albrechts-Platz 4, D-24118 Kiel, Germany 
{aba, ge j , asrjOnumerik . uni-kiel . de 



Abstract. We consider the problem of connecting a set 1 of n inpnts to 
a set O of A outputs (n < N) by as few edges as possible, such that for 
each injective mapping f : I ^ O there are n vertex disjoint paths from 
i to f{i) of length k for a given fc € N. For k = l?(log A -|- log^ n) Oruc; 
[5] gave the presently best (n, A)-connector with 0{N + n ■ log n) edges. 
For k — 2 we show by a probabilistic argument that an optimal (n, A)- 
connector has 0(A) edges, if n < A5“^ for some e > 0. Moreover, we 
give explicit constructions based on a new number theoretic approach 
that need 0{Nn^ +N^n 2 ) edges. 



1 Introduction 

A major task in the design of communication networks is to establish sparse 
connections between n inputs and N outputs that allow all the inputs to send 
information to arbitrary distinct outputs simultaneously. In the usual graph 
model, the problem can be stated as follows. 

Given n,N G N (n < A), construct a digraph G = (V,E), where V = 
/ U L U O is partitioned into input vertices, link vertices and output vertices 
such that 

• l-^l = n, \0\ = N, 

• for every injective mapping f : I ^ O there are vertex disjoint paths 
connecting i to f{i) for all i € I, 

• \E\ is small, or even minimum. 

We call a digraph as above an (n, N)-connector (well-known in literature also as 
rearrangeable network, permutation network and (A, n) -permuter) . An (n, A, k)- 
connector (or (n, A)-connector of depth k) is an (n, A)-connector where any 
output can be reached by any input via a path of length k. Let e(n, A, k) de- 
note the minimum number of edges sufficient for building an (n, A, /c)-connector. 
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Previous Work 

The size of e(n, N, k) in the symmetric case N = n is well-studied. Pippenger 
and Yao [8] proved that e{n,n,k) = and showed by a probabilistic 

argument that e{n,n,k) = 0(n^+^/^(log The best explicit construction 
of sparse symmetric connectors with odd depth is also due to Pippenger [7] who 
showed how to build (n, n, 2j -|- l)-connectors with edges. Hwang 

and Richards [4] gave a construction of an (n, n, 2)-connector with edges 

that can be extended by Pippenger’s method to yield (n, n, 2j)-connectors with 
edges (j > 2) [3]. Less results are known for asymmetric connec- 
tors. OruQ [5] was the first who devised (n, iV)-connectors for arbitrary n and N. 
In particular, he gave constructions of (n, N, l7(log2 fV-l-log 2 n)) -connectors and 
(n, N, l7(log2 A^-l-log 2 n))-connectors with 0{{N+n) log 2 n) and 0{N +nlog 2 n) 
edges, respectively, relying on recursive Clos networks [1] and concentrators as 
building blocks. 

A weak lower bound on e(n, k) can be obtained from the minimal crosspoint 
complexity of sparse crossbar concentrators. A sparse (a, &)-crossbar of capacity 
c connects an a-element vertex set A with a set B oib vertices in such a way, that 
every c-element subset of A can be perfectly matched into B. Orug and Huang 
[6] proved that the minimum number of crosspoints (=edges) in a cascade of 
k sparse crossbars with capacity c establishing paths of length k between an 
a-element and a ^-element set is at least 

+ -l) ■ 

Since an (n, fV)-connector of depth k is such a fc-cascade with a = N and b = 
c = n we have the following bound: 

e{n,N,k)> N — n + k + k{n — 1){N — n + 1)^^^ (1) 

Orug and Huang showed by an explicit construction that for fc = 1 their bound 
is attainable (within a factor of 2) when a — b < c < Vb. For other choices of 
parameters the given bound is unlikely to be tight. 

Our Results 

We are interested in (n, iV, fc)-connectors with constant k and n << N. Note 
that for fc = 1 the only solution is the complete bipartite graph Kn^N (so- 
called crossbar). The case k = 2 requires new methods and ideas, because the 
standard approach via expander graphs leads to sparse connectors only if k is 
of order f2{logN) [6]. In practical applications (design of electronic switches) 
one typically has n < '/N. In section 3 we will determine the size of an op- 
timal (n, N)-connector of depth 2 for n < as 0{N). As our main re- 

sult we will give explicit constructions for (n, N, 2)-connectors with at most 
(1 -I- o(l)) • edges. We propose new techniques for such con- 

structions based on elementary number theory (section 4, Theorem 2 and section 
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5, Theorem 3). In section 6 we iterate the depth-2 construction. Interestingly, 
the iteration leads to a depth-3 connector improving over the well-known 3-stage 
Clos network [1] by a factor of 1.5 (section 7). 

2 Sparse (n, N, 2)-Connectors 

We propose connectors with the following general structure: the outputs are con- 
nected to the link vertices via few edges, whereas the link vertices are completely 
connected to the input vertices. 

This structure reduces the problem of finding vertex disjoint paths to that of 
finding a perfect matching between link and output vertices. Moreover, Hall’s 
theorem provides a handy connector condition. 

Proposition 1 (connector condition). A digraph with vertices partitioned 
into sets I, L and O, |/| = n, \0\ = N is an {n, N, 2) -connector, if I and L are 
completely connected and |T(<S')| > 151 for all S C O with |5| < n. 

Let us assume that 



deg(x) = d for all x € O, (2) 

where d G N is constant. This condition is very natural, since all output vertices 
have the same significance. 



3 An Optimal (n, N, 2)-Connector 



Theorem 1. Letn<N^ for someO < e < 1/2. An optimal {n, N) -connector 
of depth 2 has 0{N) edges. 



Proof: The proof of the lower bound is trivial, since each of the N output 
vertices must at least have degree 1. For the proof of the upper bound let \L\ = 
'/N. For each output vertex pick at random d link vertices (we will determine 
d such that the construction is successful). If Hall’s condition is not satisfied, 
there is a subset S G O ol size at most n whose neighborhood is smaller than |5|. 
Consequently, there must be no edges connecting S to the \L\ — |F(5)| remaining 
link vertices. The probability of these bad events is bounded by 



^ fN\f VN \ f y/N-i 




di 
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We have to ensure that the latter term is smaller than 1, which is true for 
\ and thus for 

^ ^ log(2jV^) ^ 2 log fV + log 2 ^ 2 log fV + log 2 ^ ^ 

log YN — log n log YN — log n ~ 5 • log N — — e) log N 

So, there is a connector with N ■ d + \/N ■ = 0{N) edges. □ 



4 Explicit Construction I: Diagonal Method 

The connector condition suggests that the neighborhoods of the output ver- 
tices should be “almost distinct”. We propose the following additional condition, 
meaning that two elements of O may share at most one neighbor: 

\r{x) n r{x')\ < l for all x x G O (3) 



Lemma 1. The connector condition is true if (2), (3) and d > s/n hold. 
Proof: Let 1 < I < n and SCO with \S\ = I be given. 

For the sake of a contradiction we assume \T{S) \ < 1. There are Id edges between 
S and r{S), so by the pigeon-hole principle there is a y G r{S) C L with 
|T(j/) n S'! > d -I- 1. Because of (2) and (3) the neighbors of y in S' are connected 
to (d -I- l)(d — 1) link vertices in addition to y, and hence, 

n>/> |T(S)| > |T(r(y)nS)| > (d + l)(d - 1) + 1 = d^, 

contradicting d > ^/n. □ 



Moreover, we can derive a lower bound on the cardinality of L: 

Condition (3) implies that each y G L is the joint neighbor of at most 
vertices from O (we count the number of possible completions of y to a d-element 
neighbor set). So we have 



\L\ 



\L\- 



1 



d-1 — 



> Nd 



giving 



1^1 > 


1 


/I .ox 


2 + \ 


/_ + iV(d2-d) 


> 


1 

2 + \ 


i\ + ^ Y-- Yn) 



Interestingly, this bound is attained by combinatorial designs. 
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Definition 1 ([2]). Let I > d > t. A X) -design is a pair (L,B), where 

\L\ = I and B C and each t-element subset of L is contained in A elements 
ofB. 

We are interested in 2-{l, ^/ri, l)-designs. 

Proposition 2. If {L,B) is a 2-{l, design with \B\ = N, then ? = ^ + 

\j\ + N(n- y/n). 

Clearly, 2-{l,d, l)-designs can exist only for certain choices of parameters. As we 
do not require each 2-element set to be contained in exactly, but in at most one 
element of B, we are actually looking for combinatorial 2-{l, y/n, l)-packings. 

Definition 2 ([2]). Let I > d > t. A t-(Z, d, A) -packing is a pair (L,B), where 
\L\ = I and B C and each t-element subset of L is contained in at most A 
elements of B. 

Unfortunately, it is not known how to construct optimal 2-(l, y/n, l)-packings for 
all values of I and n. However, the following number theoretic method approxi- 
mates the above bound quite closely. 



The Diagonal Method 

1. Let O = {0, ... ,iV- 1}. 

2. d := ly/n]. 

3. q := min{q' G | for all p G {2, 3, . . . , d - 1} : p fq'}. 

4. L := {0, . . . ,qd — 1} (we assume L and O to be disjoint, though!). 

5. For X € O let X = xi -\- q ■ X 2 {x\,X 2 G {0, ... ,q— 1}) be the q-axy decom- 
position of X. Choose 

r{x) := {(i — 1)(1 -I- X 2 • d) -I- • d (mod qd) \ i G {1, . . . , d}}. 



Theorem 2. The diagonal method yields connectors with (1 -|- o(l))(fVn^/^ -I- 

21V1/2„3/2) 

Proof: We have to show that condition (3) is satisfied. Consider x,x' G O with 
|T(x) n r{x')\ > 2. By step 5 we have 

{ii — 1)(1 -I- X2d) -\- Xid = {ii — 1)(1 -I- x^d) -I- x'^d (mod qd) 

(i2 — 1)(1 -I- X2d) -\- x\d = (i2 — 1)(1 -I- x^d) -I- Xjd (mod qd) 

for some A yf i 2 G {1, . . . , d}. Subtracting these relations we conclude 

qd I (z 2 — ii) ■ (x 2 — x' 2 ) • d 

and thus 



q\{x2- x' 2 ), 



since gcd(q, 12 — *i) = 1 by the choice of q. This implies X 2 = X 2 and so x = x'. 
The claim follows by Lemma 1 using that |if| = Nd -I- |L|n and the fact that 
q < 2'/N (by Bertrand’s theorem there is a prime between '/N and 2V N, see 
[9]). □ 
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Fig. 1. Example: d — 4, q — 5 



Remark 1. The diagonal method’s name reflects the fact that the neighborhoods 
in step 5 can be obtained as rows of the following matrices: arrange the set 
{0, ... ,qd—l} into a, q x d matrix; build a second matrix by taking diagonals of 
the first matrix; arrange the diagonals of the second matrix into a third matrix; 
etc. 

We can slightly improve the construction by including d-element subsets of the 
“transpose” of one of the above matrices as additional neighborhoods, i.e. for 
the example considered we can get the following additional sets: 



{0, 4, 8, 12}, {1, 5, 9, 13}, {2, 6, 10, 14}, {3, 7, 11, 15}. 
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Remark 2. As mentioned above, the diagonal method can be improved in certain 
cases by constructing the neighborhoods as optimal combinatorial packings. We 
will refer to this construction as the packing variant of the diagonal method 

5 Explicit Construction II: Copy Method 

The following method assigns each element two neighbors in some small subset 
of L, and subsequently copies this neighborhood several times in order to satisfy 
the connector condition. For odd d an additional neighbor is added to the last 
copy. 



The Copy Method 

1. Let O = {0, ... ,iV- 1}. 

2. d := 

3. g:= y/N . 

4. L := {0, . . . ,qd — 1} (still, we assume L and O to be disjoint!). 

5. For X € O let X = xi + q ■ X 2 {x\,X 2 € {0, ... ,q— 1}) be the q-ary decom- 
position of X. Choose R(x) := UiG{o L<^/2-iJ} "''^here 



Ri(x) 



{xi,X 2 + q, (xi -I- X 2 ) (mod q) + 2q} + ^ • 2q, if i = ^,d odd, 
{xi,X 2 + q} + i ■ 2q otherwise. 



To prove the connector condition we need the following lemma. 



Lemma 2. Let ? G N, S' C Nq with |S| = 1. 

a) Let f = (/i,/ 2 ) : S — >■ Nq be injective, then |/i(S)| -I- |/ 2 (S)| > 2\/l. 

b) If f= (/i,/2,/3) injective, |/i(S)| + |/ 2 (S)| + \h{S)\ > 3^1. 

Proof: a) By the pigeon-hole principle, there is an a; G /i(S) with 



As / is injective, we have 



and thus 



|/i(S)| + |/ 2 (S)| = (^7^- + 2 V|/i(S)|-|/ 2 (S)| > 2V1. 

b) follows from a), since |/i(S)| -k |/ 2 (S)| -k |/ 3 (S)| = |(|/i(S)| -k |/ 2 (S)| + 
|/l(S)| + |/3(S)| + |/2(S)| + |/3(S)|). □ 




20 



A. Baltz, G. Jager, and A. Srivastav 



Remark 3. The claim of Lemma 2 b) can be strengthened as follows: 



l/l(^)l + l/2(^)| + |/3(^)l > 




This allows us to decrease d by 1 in certain cases (see the example (n, N) = 
(28, 10000) in section 7). 



Theorem 3. The copy method yields connectors with {1 + ■ 
edges. 

Proof: We have to show that the neighborhoods constructed by the copy 

method satisfy Hall’s condition. Let SCO with jS”! < n. The function 

/ = (/i,/2) : S' -)> No, X (xi,X2), 

where xi- q + X 2 is the g-ary decomposition of x, is injective. By construction of 
To{x), and Lemma 2 a), 

|ro(S)| = |/i(5)| + |/2(S)| > 2yW\- 
For d even, we obtain: 

|r(s)| = i-|^o(s)| > l-2^\ > |s|. 

For d odd, we have: 

\r{s)\ = d^.|ro(s)| + |F^(s)| > (d-3)v1^ + 3v1^ > |s|. 

2 

The number of edges can be estimated as in the proof of Theorem 2. □ 



6 Iteration of the Methods 

Our construction methods can be iterated to yield (n, N, fc)-connectors for arbi- 
trary k > 2. 



Iteration of the Diagonal/Copy Method 

1. Let k be the depth of the connector. 

2. Let Lq := I and := O. 

3. For s = Q, . . . ,k — 2 

Apply the diagonal or copy method to n := \Lq\ and N := \Lk-s\- 
Define Lk-s-i ■= L. 

4. Connect Lq and Li by a complete bipartite graph. 
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We can reformulate the connector condition for depth fc: 

Proposition 3. A digraph with vertices partitioned into sets Lq, Li, . . . , Lk-i, 
and Lk, \Lq\ = n, \Lk \ = N is an (n, N, k)-connector, if Lq and L\ are completely 
connected and |-T(5') fl Li_i| > [S'! for all i G {1, • ■ • ,k — 1} and all S C Li with 
I'S'I < n. 

For connectors of depth k we can show by induction on k: 

Theorem 4. Let ao := for s = 1,2, ... ,k—l and ak-s ■= 

■ Then k — 1 iterations of the copy method yield connectors 

with (1 + o(l)) • Os) edges. 

For the proof we refer to the full version. 

Remark 4- We can get an analogous result for the diagonal method. 

7 Comparison of the Methods 

The following table compares the diagonal and copy method for several values 
of n and N. In addition we list values obtained from the packing version of 
the diagonal method (see Remark 2). Apart from the case k = 2, we include 
the trivial case k = 1 and the case k = 3 for comparison with the asymmetric 
3-stage Clos network [1]. Additionally, we compare the results with the lower 
bound for e{n,N,k), see (1). 



Table 1. 



(n,N,k) 


Diag 


Copy 


Pack 1 


Clos 


Low. bound 


(16,2048,1) 


1 32768 1 


- 


32528 


(16,2048,2) 


11200 


11136 


10720 


- 


3387 


(16,2048,3) 


10032 


9824 


9544 


16448 


2606 


(24, 6000, 1) 


1 144000 1 


- 


143448 


(24, 6000, 2) 


39480 


39360 


38376 


- 


9535 


(24, 6000, 3) 


34735 


34350 


33785 


58920 


7232 


(28, 10000, 1) 


1 280000 1 


- 


279244 


(28,10000,2) 


76968 


64000 


75568 


- 


15367 


(28,10000,3) 


68508 


56500 


66588 


106835 


11719 



We see that the Clos method approximately improves the trivial method of 
depth 1 by a factor of at least 2. The diagonal or copy method nearly gives an 
additional improvement of a factor of more than 1.5. The copy method is slightly 
superior to the diagonal method, as can be seen from the Theorems 2 and 3. 
In the last example the copy method yields better results due to the rounding 
effect described in Remark 3. 
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8 Open Questions 

1. Can we derandomize the probabilistic proof of Theorem 1? 

2. How can we construct a good (n, iV, 2)-connector when \L\ is prespecified? 

3. What are efficient constructions for the multicast case where each of the n 
inputs is allowed to connect to r > 2 specified outputs? 



Acknowledgments. We would like to thank Amnon Ta-Shma for the idea of 

the probabilistic proof of Theorem 1. 
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Abstract. We define a separation logic (BI-Loc) that is an extension 
of the Bunched Implications (BI) logic with a modality for locations. 
Moreover, we propose a general data structure, called resource tree, that 
is a node-labelled tree in which nodes contain resources that belong to a 
partial monoid. We also define a resource tree model for this logic that 
allows to reason and prove properties on resource trees. We study the 
decidability by model checking of the satisfaction and the validity in this 
separation logic and also introduce a sequent calculus for deciding valid- 
ity by deduction w.r.t. a resource model. Then, we relate the separation 
logic and resource trees to some applications and finally define a sequent 
calculus for BI-Loc dedicated to a theorem proving approach. 



1 Introduction 

The notion of resource is a basic one in many fields, including computer science. 
The location, ownership, distribution, access to and, indeed, consumption of, 
resources are central concerns in the design of systems (such as networks) and 
also of programs which access memory and manipulate data structures (such 
as pointers). The logic of Bunched Implications (BI), that is a resource-aware 
logic in which additive (A,— >■) and multiplicative connectives (*,^) cohabit [14, 
15], appears as a kernel logic of resource (de) composition as illustrated by sep- 
aration logics [12,13,16] and also by spatial logics for distributed and mobile 
computations [8]. It is natural, in this setting, to consider primitive notions of 
location or distribution and then to define a new separation logic, based on BI 
logic extended with a location modality, for reasoning on distributed resources. 
In addition, we aim to define a resource tree model for this logic that allows to 
reason and prove properties on based-on resource trees structures. 

In this context, it is important to define models that well reflect structures but 
also logics that are enough expressive to represent data properties and enough 
restricted to decide if a given model satisfies a formula or if some properties entail 
other properties. We observe in recent works on tree models (and their related 
logics) some limits for information representation such as the difficulty to repre- 
sent complex data inside nodes [4] and the choice of a resource composition that 
is not partial [7], knowing that partiality enables to ensure that substructures 
are disjoints. Therefore, we study a new data model, called resource tree, that is 
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a labelled tree structure in which nodes contain resources that are elements of 
a partially defined monoid and we show how and why it is an appropriate pro- 
posal to deal with resource distribution. Then, we define a tree structure that 
represents complex information, through resources inside nodes, and also distin- 
guishes the structure and the information it contains. Moreover, we can alter an 
existing structure and characterize inconsistent trees (not valid according to a 
specification) . 

In order to define a logic for resource trees we consider the extension of the BI 
logic with a modality for locations (BI-Loc) . It can be viewed as a separation and 
spatial logic: the BI’s multiplicatives naturally introduce the notion of resource 
separation and the location modality allows to gather resources in some locations 
and thus to introduce a notion of spatial distribution of resources. The use of a 
partial composition on resources and the introduction of a location modality in 
a resource-aware logic [10] are key points in this work. 

In section 2, we define what a resource tree is and illustrate its specific features. 
Section 3 presents the related separation logic BI-Loc and in order to analyze the 
decidability for resource trees against this logic, we show, in section 4, that BI is 
decidable by model-checking for partial resource monoids that satisfy some spe- 
cific conditions. This result is extended to resource trees and its separation logic. 
Moreover, we derive a sequent calculus for the logic that is sound and complete 
w.r.t. an interpretation in terms of satisfaction. In section 5, we consider some 
applications of resource trees and BI-Loc and analyze the relationships with 
other models and logics dealing with the notions of location [4] and hierarchy 
of resources [1]. To complete these results, we propose, in section 6, a sequent 
calculus for BI-Loc that allows to check the validity of a formula for all mod- 
els and then by a theorem proving approach, starting from some recent results 
about decidability of BI and a new complete resource semantics [11]. Section 7 
includes conclusions and perspectives. 



2 Resource Trees 

Our goal is to define an abstract model for reasoning about distributed resources. 
Such a model must be rich enough to represent complex information and to 
manage the information distribution. We want to divide the space into locations 
and to explicitly mention where the resources are. In this context, it seems 
natural to divide a location into sub-locations and them to provide a hierarchical 
representation of the space and thus define the resource tree notion. 

A resource tree is a finite tree with labelled nodes, each node containing resources 
which belong to a partial monoid of resources. In a resource tree, a path (list of 
label names) always leads to at most one unique node. More formally, a resource 
tree can be seen as a pair (m, /) where m is the resource contained in a root node 
and / is a partial function which associates a resource tree to some labels. For 
instance, (m, {I i— >■ {m' ,nil))) corresponds to a tree which contains the resource 
m and a child at location I which contains m' . Formally, resource trees can be 
defined as follows: 
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Definition 1 (Resource Tree (1)). Given an enumerable set of names C and 
a partial resource monoid A4 = (M, *,e, C), the set of resource trees over A4 
(denoted Tm) is inductively defined as follows: Tm '■'■= M y.[L ^ 7m]- 

Moreover, the composition of resource trees (denoted |) can be used to com- 
bine either trees or resources. Due to the partiality of the monoid operator •, the 
composition of resources m and m' (denoted m • m') may be undefined. Con- 
sequently, the composition of resource trees is also partially defined. Given two 
resource trees T = (m, /) and T' = (m', /'), T|T' is defined iff m • m' is defined 
and for any location I such that f{l) and f'{l) exist then f{l)\f'{l) is defined. 
Then T|T' = (m", /") with m” = m» m' and for any location I, f”{l) is equal 
to f{l)\f'{l) if f{l) and f'{l) are both undefined, to /(Z)(resp. /'(?)) if f'{l) (resp. 
/(/)) is undefined, and is undefined if both are undefined. The direct description 
of resource trees by a recursive domain equation is similar in the spirit to the 
representation of heap models of BI [12], as used in separation logics. In related 
work, a different style of model representation has been used, where a grammar 
of terms is given together with a syntactic structural equivalence notion [4] . We 
choose here such a representation that provides a convenient language for writing 
down model elements. 

Definition 2 (Resource Tree (2)). Given an enumerable set of names C and 
a partial resource monoid M. = (M, *,e, C), a resource tree over A4 (denoted 
Tm ) is inductively defined as follows: 

P ::= m \ P\P \ [1]{P) (where m £ M and I £ C). 

For any resource m, m! £ M, mjm' m • to'. 

A node labelled with I which contains a subtree P is denoted [1]{P)- The 
empty tree corresponds to the neutral element of the monoid (denoted e). For 
instance, [?](to| [F]to') represents a tree with a child I containing a resource to. 
and a child I' that contains a resource to.'. 

We want that a unique location corresponds to a unique path and handle compo- 
sition of trees having the same label on the root node (as [l]P|[?](5) by merging 
nodes with the same labels. Compared to other tree models [4], a major im- 
provement is that nodes are not only labels but can also contain information 
(resources) . Moreover, we have a unique corresponding location for a given path 
and the composition operator is partial. There is a structural equivalence = 
between resource trees that is defined as follows: 

- P = P - it P = Q then Q = P - it P = Q and Q = R then P = R 

- P\Q = Q\P - {P\Q)\R = P\{Q\R) -itP = Q then P\R = Q\R 

-P\e = P - [Z]P|[/]Q= [/[(P|Q) - if P = Q then [Z]P = [Z]Q 

The rule ([Z]P|[Z](5 = [1]{P\Q)) corresponds to the way we handle the compo- 

sition of trees in case we must compose two sibling nodes with the same label. In 
a nutshell, the composition operator (|) merges nodes with the same label and 
composes others as the usual composition of trees. Then, the composition of two 
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Fig. 1. Resource Tree Composition 



nodes with the same labels is equivalent to one node which contains resources 
and subtrees of these two nodes, as illustrated in the figure 1. 

Moreover, we can inductively extend the C relation of the monoid to 
resource trees by defining the partial ordering relation ^ on resource trees is 
follows: 1) m ^ Q if and only if Q = m' and m □ m' ; 2) \l]P' =4 Q if and only 
if Q = [l]Q' and P' =4 Q' and 3) P'\P” =4 Q if and only if Q = Q'\Q" ,P' ^ Q' 
and P" ^ Q". 

Let us note that a resource tree P defined with Definition 2 corresponds 
to the following resource tree P with the Definition 1: m (m,nil), 
P\P' P\P' and [l]P {e,l i— >■ P). Then, we can prove that, given two 
resource trees P and P' defined with Definition 2, we have P = P' if and only 
ifp = P'. 

We now illustrate some specific features of resource trees by considering the 
representation of an XML tree as a resource tree. Actually, we may represent the 
tree structure and the node contents as resources. The resource tree composition 
allows to alter a tree structure but in a fine way. Actually one can compose, with 
the I operator, an existing tree with another one which will add a resource subtree 
into a specific node. However, this property implies a less intuitive representation 
of information. In fact, information which occurs through labels in [4] does not 
correspond to labels in resource trees but is represented by resources inside nodes. 
It provides an easier way to represent complex information and distinguishes 
the shape of data (the tree structure) from information it contains (tags and 
attributes). In our data, a location name only refers to a specific part of the tree 
structure. 

Figure 2 gives a representation of a XML tree and illustrates how to add an 
attribute to a given node. We have the representation of the XML file on the 
left-hand side and the second tree shows how we can add the attribute id = ’ 42 ’ 
to the node corresponding to the first message. Let us note that attributes and 
tags are both represented as resources. Our representation of attributes allows 
us to describe pointer references (references to another part of the tree using the 
id attribute) such as exposed in [6]. Moreover, we can as well add a subtree at a 
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(P) (P’) (P”) 



(message) 

(from) Alice (/from) 
(to)i4ob(/to) 
(/message) 

(message 1d=’43’) 
(from) Bob (/fioiii) 
(to)Alice(/to) 
(/message) 
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Fig. 2. Representing and Altering a XML Data 



given node instead of just one resource. Let us suppose that, instead of adding 
the attribute to the first message, we add it to the second one and that also, as 
it is the case in XML, the id attribute is unique (i.e., you cannot declare two 
id attributes in the same node). Consequently, the resulting tree does not cor- 
respond to a valid XML data. The partiality of our resource composition gives 
an easy way to handle with this kind of invalid data. Actually, it is sufficient to 
declare that any composition of a resource representing an id data with another 
one representing another id data is undefined. 

Compared to edge-labelled trees [4], resource trees allow representation of in- 
formation inside nodes where the former can only represent tag names and tree 
structure. With the composition operator, we can alter an existing structure 
instead of just to add information next to the existing one, and characterize 
inconsistent trees (trees which are not valid according to a specification) . 

3 A New Separation Logic 

The BI logic is a resource-aware separation logic in which additives (A,— >■) and 
multiplicatives (*,^) cohabit. The decidability of propositional BI logic has 
been recently proved and a based-on partial monoid semantics, that is complete, 
has been defined for intuitionistic additives [11]. We define here an extension 
of BI with a modality of location (BI-Loc) and a based-on resource tree model 
for this logic, according to the satisfaction relation P \= (f>. Then, we define 
what a valid formula is and we show the correspondence between validity and 
satisfaction relation. 

Definition 3. Given a set of location names C and a language of propositional 
letters V, the set of formulae of BI-Loc is inductively defined as follows: 

4> [l](l> IpI / | f \ \ T I T | fV f \ (f> Af \ f f (with I € C and p € P) 

The modality [-] is a spatial modality as discussed above. We now define a 
resource tree model for this logic based on the partial monoid resource model 
[ 11 ]. 
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Definition 4 (Interpretation). Given a partial monoid of resources M = 
(M, •,€,□) and a language V of propositional letters, an interpretation is a 
function [[]] ; V p(M). 



Definition 5 (Resource Tree Model). A resource tree model is a triplet T = 
{A4, 0 , 1 =) such that A4 = (M,»,e,G) is a partial resource monoid, [[]] is an 
interpretation and \= is a forcing relation over Tm x V{P) (Tm is the set of 
resource trees over M and V{P) is the set of BI-Loc propositions) that satisfies 

— P \= [ff iff there exists Q such that [?]Q P and Q \= f 

— P^I zffe4P 

— P \= (f * 'ip iff there exist Q,R such that Q\R is defined, Q\R 4 P, Q \= f o,nd 

P|=V>. 

— P 0 iff for all Q such that Q \= (p and P\Q is defined, P\Q \= ip. 

— POT always. 

— P 0 T never. 

— P \= p iff there exists m £ M sueh that m 4 P and m £ [[p]] • 

— P\=(pVip iff P\=(p and P |= ip. 

— P \= (p A Ip iff P \= (p or P \= Ip. 

— P\=(p^ip iff P\=(p implies that P \= ip. 

This semantics relies on the partial monoid semantics of BI [11], but it deals 
with resource trees instead of resources. Consequently, we have extended the 
definition of the forcing relation in order to handle such trees and the [-] modality. 

Definition 6 (Satisfaction). Given a resource tree model T = (At, [[]], ]=)> ® 
resource tree P over M satisfies a formula (p if and only if P \= (p holds. 

Let us clarify the meaning of some formulae, especially those involving units 
and the — >■ operator. First of all, it appears that [?]T (resp. [Z]/) is not equivalent 
to T (resp. I). The first formula indicates that there exists a child node called [^] 
(which must be empty in the case of [I]/) whereas the second formula does not 
ensure such an existence. Secondly, T does not behave as other units. Actually, 
[Z]T is equivalent to T and both cannot be satisfied by a resource tree. Finally, 
[l]{<p — >■ (p) is not equivalent to [l](p — >■ [l](p. Actually, the first one is satisfied 
by any resource tree which have a child-node I while the second one is always 
satisfied. An essential point is that the structural equivalence defined in Section 
2 does not alter the satisfaction relation. 

Lemma 1. Let P and Q be resource trees such that P = Q and (p he a BI-Loc 
formula, P \= <p if and only if Q \= (p. 

Proof. By structural induction. 

We can now define the validity of a BI-Loc formula: 

Definition 7 (Validity on a Resource Tree Model). Given a resource tree 
model T — {Ai , [[ ]] , [=) and a formula (p, T \= (p if and only if for any resource 
tree P, such that its resources belong to Ai, we have P \= cp. 
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The validity on a resource model could be expressed as a satisfaction relation. 
Lemma 2. Given a formula <j), \= (p if and only if e \=T ^(p. 

Proof. By definition of the satisfaction relation. 

Moreover, by extension, the validity of a formula is defined as follows: 

Definition 8 (Validity). Given a formula (p, \= <f> if and only if for any re- 
source tree model T = (At, [[]], |=), T \= <p. 

Compared to the spatial logic for edge-labelled trees [4] which handles only 
units, our logic handles propositional letters and a location name always refers 
to the same location (which is neither an advantage nor a disadvantage, but just 
another view of locations). Our logic does not include a placement @ operator 
but we can use propositions and resources in order to embed the behavior of SLT 
locations, by adding quantifications on locations in BI-Loc, as discussed later. 

4 Decidability on a Resource Model 

Concerning the above-mentioned calculi (ambients, trees or pointers) and their 
spatial or separation logics, recent works have investigated the border between 
decidable and undecidable cases of model checking for a calculus and its related 
logic [4,5,9]. Let us recall that the model-checking problem is to decide whether a 
given object satisfies (is a model of) a given formula. In these logics, we observe 
that the decidability depends on interactions between the separation connectives 
(*, ^), the classical ones and the ones introducing spatial modalities. One cen- 
tral point is that the -* operator introduces an infinite quantification on trees. 
In order to obtain the decidability by model-checking, we must be able to bound 
this quantification and to capture the interactions of -* with other connectives. 
These key points have been already identified, but in a different way, in the proof 
of the decidability of propositional BI logic [11]. 

4.1 Deciding Validity by Model Checking in BI 

A main problem is the infinite quantification introduced by the -* operator. 
It is not directly related to the tree structure and consequently we start by 
focusing on the classical resource monoid and BI formulae. Therefore, we propose 
a restriction on the resource monoid which allows to model check the satisfaction 
and validity of a BI formula. We define sufficient conditions on monoids in order 
to decide the satisfaction by model checking. 

Definition 9 (Boundable Resource Model). A resource model is said 
boundable if: 

1. For any m € M there exist mi, . . . m„ such that m C mi • . . . • m„ and for 
any mi, mi is not decomposable (i.e., there is no m',m'' such that m' ^ e, 
m" ^ e and mi Qm' • m” ). 
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2. For every subset of propositions L' and every integer n there exists an equiv- 
alence relation =l' , n such that : 

a) e =L',„ m iff eQm 

b) For any m,m' € M andp € L' ifm =L',n th' then m G [[p]] iffm' € [[p]] . 

cj For any G M, ifm =L',n "rn' and if m • m" is defined then 

m' • m" is defined and m • m" =L',n m' • m” . 

d) For any k < n, m,m' € M such that m =L',n iti' if there exist mi, . . . 

such that m C m\ • • mk then there exist m'l, . . . m'f, such that m' C 

m'^ • ... • m'f. and for any i < k mi =l',o 

e) is finite. 

The above definition does not impose to have a finite monoid but only re- 
quires that the monoid has a finite quotient for each equivalence relation. Some 
instantiations of such models satisfying these requirements are given later. 

We show that, for a given BI formula, there exists a finite equivalence class to 
work on. Given a BI formula (j), we consider the set of propositions which 
occur in (j) and we define the size of (f (denoted s{(j))) as follows: 

Definition 10. Given a BI formula 4>, the size of (p is inductively defined by 

- s{I) = 1 - s(T) = 0 - s((^' V f”) = max{s{4>), s{4>')) 

- s(p) = 1 - s(0' f") = s{4)'') - A f") = max{s((j)), 

- s(-L) = 0 - s(0' * 4>”) = s{<l> ) -\- f") = max{s{(j)'), s(0”)) 

The size of a formula p determines the number of resource decompositions 
that are necessary to decide if (p is satisfied by a resource or not. Our goal is 
to show that checking if a resource m satisfies p^ip corresponds to checking if 
m • m' satisfies for a given subset M' of M. 

Lemma 3 (Monotonicity). Ifm =l’ ,n m' , L” C L' and n' <n then m 
m' . 

Proof. From the conditions 2b and 2d of Definition 9. 

Lemma 4. If m • m' =L',ni+no, t then there exist t',t", such that t Q t' • t" , 
m =L',ni t' and m' =L’,ri 2 t" ■ 

Proof. From the conditions 2c and 2d of Definition 9. 

Lemma 5. Ifm m' and m\= p then m' |= p. 

Proof. By induction on the structure of p. We develop some interesting cases. 

- Case p = I. As m\= p, eQm and m' =0^1 m, from condition 2a of a boundable 
monoid we deduce e Qm', and m' ^ I. 

- Case p = p * p'. By definition there exist mi and m2 such that mi ^ p, 

fTT -2 H &nd mi • m2 E m. Thus by lemma 4, there exist m'l, m^ such that 
mi m'l and m2 —L^,s{ip') ^2- induction hypothesis, we can conclude. 

- Case p = p^p' . We consider mi such that mi ^ p then m • mi ^ p'. As 

m m', we have m • mi =1^^ ^(0) m' • mi (due to condition 2c of a 

boundable monoid). We conclude with the induction hypothesis. 
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Corollary 1. For any resource m and any formula of the form m ^ 

iff for any m' € such that m» m' is defined, m» m' \= 'if- 

Proof. Immediate consequence of Lemma 5 and condition 2 of Definition 9. 

As the case of the operator is the crux of the satisfaction problem, we have 

Theorem 1 (Satisfaction Decidability for BI). Given a boundable partial 
resource monoid Ai = {M, •, e, C), for any BI formula 4> and any resource m € 
M , m\= 4> is decidable. 

Proof. Consequence of Corollary 1 (for the finite test to check formula of the 
form and of condition 1 of a boundable resource model (to ensure that 

there is only a finite set of decompositions necessary to check a formula as <j>*-tf;). 



Corollary 2 (Validity Decidability for BI). Given a boundable partial re- 
source monoid M. = (M, •, e, C), for any BI formula 4>, \=m 4> is decidable. 

Proof. Immediate consequence of Theorem I using Lemma 2. 

4.2 Validity by Model Checking in BI-Loc 

In order to extend the above results to BI-Loc and resource trees we must add 
the tree structure without loosing the ability to bound the resource tree when we 
consider the -* operator. Therefore, we have two main problems: 1) to restrict 
the location names to deal with since an infinity of locations leads to an infinity 
of trees and 2) to bound the height of the trees. 

Given a BI-Loc formula </>, denotes the finite set of location names of (p. 

Lemma 6. Given a formula (p, a resource tree P and two location names I and 
I' which do not belong to P \= <p if and only if P{‘/i'} H 4’- 

Proof. By structural induction on <p. 



Corollary 3. For any BI-Loc formula (p and any resource tree P, there exists a 
resource tree P' such that the location names that occur in P' belong to C^U{1^} 
(where It^ is a location name which does not occur in (p) and we have P \= <p if 
and only if P' \= <p- 

Proof. We replace all occurrences of locations in P which are not in (p by It^ and 
obtain P' . 

For a given resource tree P, we note h{P) its height which is defined as usual 
for a tree. Moreover, we define the height of a BI-Loc formula. 

Definition 11 (Height of a Formula). Given a BI-Loc formula (p, the height 
h{(p) of <p is inductively defined as: 
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- h(I) = 1 - h{[l](l>') = 1 + h{(j>') - h{4>' V (f)') = max{h{(j)'), h{4>")) 

- h{p) = 1 - h{4>' (j)") = h{cf)'') - h{4>' A (f)") = max{h{cj)), h{4>'')) 

- /i(-L) = 0 - h{T) = 0 - h{4>' — >■ (j>") = max{h{(j)), h{(!)')) 

- h{(l>' * cf)") = max{h{cj)'), 

The height of a formula is the maximum height of the trees we have to 
consider in order to check if a formula is valid or not. Therefore, we have the 
following result: 

Proposition 1. For any B I- Loc formula <j) and any resource tree P, there exists 
a resource tree P' such that h{P') < h{4>) and P \= 4> if and only if P' ^ (f. 

Proof. The interesting case occurs when h{P) > h{(j)). It is sufficient to cut 
subtrees which are too high and to prove, by structural induction on <f>, that the 
resulting tree satisfies the condition. 

Contrary to what is done in [4], we do not require to limit the width of the 
tree. This is due to the rule {[l]m)\{[l]m') = [l]{m\m'). Consequently, the width 
of the tree is limited by the number of locations name which are relevant. 

Theorem 2 (Satisfaction Decidability). Given a resource tree model T = 
(A4 , [[ ]] , H) where Ai is a boundable partial resource monoid, for any BI-Loc 
formula (j) and any resource tree P & T , P \= 4> is decidable. 

Proof. We can extend the congruence relations of a boundable monoid in order to 
include restrictions on height and location names introduced by the last lemmas. 
Then, we deal with the set of trees modulo this relation and prove that this set 
is finite. 



Corollary 4 (Validity Decidability). Given a resource tree model T = 
{A4, O 7 1=) where M. is a boundable partial resource monoid, for any BI-Loc 
formula 4>, T \= (f is decidable. 

Proof. Immediate consequence of Theorem 2 with Lemma 2. 

In order to build a sequent calculus to deduce the validity of a formula from 
others, we can define a sequent calculus with contexts as in [4] . A context P or A 
is a finite multiset of expressions P : </> where P is a tree and is a formula. We 
obtain such a calculus by only adding the following rules to the sequent calculus 
of [4]. 



P — Q VQ, R.P = Q\R => 7^, Q : 0, i? : '?/? h Z\ 

r,P ■. f'r Q ■. (j),A r,P ■. A 



ym.P = m ^ m ^ [[p]] 
P,P :p\- A 



3m.m = P and m (f [[p]] 
r h P : p, A 



(Prop : L) 



(Prop : R) 
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The interpretation [[Pi : 0i,...,P„ : 4>n h Qi '■ : ipmj of a sequent is 

defined as follows: if Pi |= ^i, . . . , P„ |= rpn then Qi \= |= We can 

prove by structural induction that this calculus is sound and complete according 
to the semantics of resource trees and also that the existence of a derivation of 
a sequent F \- A is decidable. 

5 Resource Trees and Applications 

XML trees. The example of section 2 presents a resource tree representation 
of an XML entry, in which location names are arbitrary and do not have a 
particular meaning. In order to make abstraction of location names, it looks 
useful to provide quantifications on locations in BI-Loc. In this context, we need 
standard quantifications to state that there exists a location where a proposition 
is true and that a proposition is true for all locations. Moreover, we extend the 
forcing relation of the resource trees as follows: P ^ 3l.(j> if and only if there 
exists I' such that P \= (j){’‘ //} and P \= if and only if for all I', P \= (/>{'' /;}• 
Such quantifications, mixed with the expressivity of resource trees, allow to treat 
useful data structures knowing that they may lead to undecidability if their use 
is not restricted enough. 

Let us consider how we can represent ’’pointers” in trees, such as proposed in 
[6] . Actually, if we consider the XML example and a corresponding resource tree 
proposed below, we observe that the content of a capital tag must refer to a city 
and the content of the state.of tag must refer to a state. 

{state kl=’s2-) 

(scode)NE{/acode) 

(a name) Ne\-ada(/ sname) 

(capital}idref=’e3’{/uapitaJ) 

{/state) 

{city id— ’c3') 

(ccode) C CN {/ccode) 

(cname) Carson Cit>'(/cnanie) 

(state^>f) idref= ’ (/st at e-i>f) 

</city) 




In our model, to represent and check the property of the link between capital 
and a city for a given id corresponds to check the satisfaction of the following 
formula: 

\/li.{ 3 l 2 .{ 3 h.{[h]{state* [l 2 ]{capital * [l3]{content A id))) * T)) — >• 31 4 . {[1 4 ] {city * id) * T) 

Here state (resp. capital) means that the location contains a state (resp. 
capital) tag. Content and id respectively indicate that the node describes 
the content of above tag and that a resource corresponding to the given id is 
present. Thus, content A id means that the content of the tag is the id we are 
looking for. With a similar approach it is possible to check that two nodes do 
not have the same id. 
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Hierarchical Storage. Ahmed et al. have recently proposed a logic for 
reasoning about logical storage [1], that is inspired by the BI logic and the 
Ambients logic. As our work is also inspired by the former, it seems natural to 
analyze if our proposal can be used and adapted to reason about the hierarchical 
storage. Let us consider the store semantics of [1]. Let Val be an abstract set 
of values, a store is a partial map from paths p (a list of locations) to values 
v: Store s S Path Val. An intuitive relation arise between Path and 
our locations and between Val and our monoid of resources. Then, the store 
semantics can be seen as a specific resource tree model, the path name of this 
semantics corresponding to the set of locations. Similarly, Val corresponds to 
a monoid such that the neutral element is the null value. The composition 
relation is totally undefined except for composition with the neutral element 
(we cannot have two values in a given store). However, BI-Loc is propositional 
(not first-order) and does not provide operators to check adjacency of stores. 
Using the resource tree and BI-Loc to reason about hierarchical storage is 
consequently less expressive but it has other advantages. Actually, if we restrict 
the propositions to type verification (the store contains a value of a given type), 
the resource model satisfies the requirements for being model checked and 
therefore we can check validity of BI-Loc formulae in the store semantics. 

Spatial logics for Trees. Having previously compared resource trees 
with other tree models, we now propose an extension of our logic to encode the 
spatial logic for trees. In [4] one can have two sibling edges with the same label. 
So, n[0]|n[0] is a tree with two sibling nodes accessible through a path n. A 
corresponding resource tree can be [l 2 ]n where n is the result of a mapping 

from the set of location names of the edge labelled-tree into the monoid of a 
given set of resources. But such a translation provides new arbitrary location 
names in the resulting resource tree. Thus if we aim to find a BI-Loc formula (j) 
such that the resulting tree satisfies (j), we have to handle with those locations. 
But the standard quantifications do not behave at it should to be. Actually, 
a translation of the formula Z[0]|l[0] should intuitively be 3li.3l2-[li]I * [h]!- 
However, with standard quantifications, we cannot ensure that h and I 2 are 
distinct. Then, we must restrict the quantifications to ensure that existential 
quantification do not make equal two locations which must be distinct. This 
point is related to the general problem of label (or name) restriction in our 
model we will develop in further work. With a fresh label quantification, the 
corresponding BI-Loc formula of n[(j)] (called location) would be 3l.[l]{n * 4>) 
(where (j) is the corresponding formula of (jh). As in the previous part about 
XML, we hope that the restricted use of quantifications will not alter some 
decidability results. Moreover, we can show that a formula of SLT logic, that 
includes the placement formula denoted can be transformed into a SLT 

formula without @ and this translation does not alter the satisfaction relation 
defined for edge-labelled trees. 
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6 Decidability for All Resource Models 

Reasoning about validity of a formula for all resource tree models cannot be done 
by model checking but by theorem proving. The sequent calculus with contexts 
of the previous section is not a solution because contexts require to work on a 
given model. Propositional BI has been proved decidable by theorem proving 
[11] and there exists several calculi [10,11,15] to check validity of BI formulae. 
Like for model checking, we can extend the decidability result to BI-Loc with 
some sequent calculus rules that handle locations and also consider extensions 
of the related proof-search methods. 






(ax) 



A\- (j) 

r\- (j) 



(A = n 



r{A) \-<j} ^ ^ r(A-A) I- 0 _ 

r{A-A') \- 4 > r{A) h (j> 



r(T) h 0 



Ul) 



r{^m) b <j> 

r{i) h 0 



(II) 



0™ b/ 






r{^a) b Cj> 
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(Vi) 



0a bT 



(Vh) 



Ab 0 ' r^4, z\(z\',</.') b 

b v> r,A'r a{a' , r,4>^4>') b 



r b 0 -k/-' 



( r) 



r(0; 0') b v> r^cf> A^ 4>' rb0 A{A'-<j)')^ -ip 

(A^) {^ r ) (~^l) 

r(0 A <?!-') b i/> r-,Ah A{A';r-,4>^ ip 

r(0) b v> b ^/) r \- <j)i{i = 1,2) r-cf)\-(i)' 

l ) (Vp) r ) 

r{<j>\/ 4 >');A{ 4 >\/ ip rb,^iv<^2 r\-<j>^ 4 >' 

r\- cj) 

(Loc)with [l]r not an empty bunch 

[i]r b [i]<p 



Fig. 3. A Sequent Calculus for BI-Loc 



A sequent F \- (p is valid if and only if for any resource tree model T and for 
all P G T, P \= (pr implies P \= <p. The sequent calculus for BI-Loc is given in 
Figure 3. The rules are those of BI plus the (Loc) rule that handles the location 
modality. In this rule, [l]P means that each formula of the bunch has the form 
[l]4>. It can be read bottom up as resource trees which satisfy [l]P also satisfy 
[l]4> if and only if resource trees satisfying P also satisfy (p. The non-emptiness 
condition in the (Loc) rule is required to ensure that the location I exists in the 
resource tree. Without this requirement, we can prove the sequent b [l]{(p — f (p) 
which is false since it is not valid for a tree which does not contain a location I 
(as P ^ [l]tp if and only if there exists P' such that P =4 \l]P' and P' \= f}). 
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Theorem 3 (Soundness and Completeness). There exists a proof of T \- (f> 
in the BI-Loc sequent calculus if and only if \= r^4> is valid. 

Proof. As BI is sound and complete for the partial monoid semantics [11], we 
show that sequent rules inherited from BI are sound according to the resource 
tree semantics. Then, we prove that the (Loc) rule is also sound. Given a sequent 
r \- (f> and a resource tree P such that P \= (fr (where (fr represents any 
formula in P) and P \= (j> then, by definition [l]P h [l](f>r and [l]P h [l]4>. The 
completeness proof can be done by proving that if T 1/ (/> then there exists P 
such that P \= (fr and P (f). 

Proof search for BI can be extended to handle locations that do not introduce 
infinite loops. Actually, as in the decidability by model checking, the crux of the 
finite model property is to bound the infinite tableau branch introduced by the 
operator (see [11] for details). As the location modality does not introduce 
any infinite quantification on trees, we can build a finite tableau for BI-Loc. 
Moreover, we must study if we have criteria to decide if a given branch of a 
tableau is closed or not. Actually, we must ensure that a location exists for some 
subformula. Furthermore, as the T unit is local, we can obtain a partially closed 
branch (branch which is not closed for all locations). Both can be systematically 
handled and thus we can define more complex criterias, but we can decide if a 
tableau is closed or not. 

7 Conclusions and Perspectives 

This work emphasizes that BI logic and its based-on resource semantics [11] is a 
logical kernel from which spatial and separation logics are defined for reasoning 
about various data structures or on resource distribution and interaction. We 
have introduced some spatial information by fixing resources in some locations 
and proposed a separation and spatial logic BI-Loc from which we can describe 
various resource models. In this context, we introduce a particular data model, 
called resource tree, in which nodes contain resources that are elements of a par- 
tially defined monoid. At the present state, the representation is static because 
when the resources are distributed in the tree, they cannot move from their ini- 
tial location. A first solution could be to extend the BI-Loc logic with movement 
modalities as it has been done with linear logic in the context of modeling dis- 
tributed and mobile systems [2]. Our results about validity by model checking 
and by deduction for BI and BI-Loc appear important for a further study of va- 
lidity and reasoning in new separation and spatial logics like the one dedicated 
to hierarchical storage [1]. In this context, it should be relevant to compare our 
approach with the spatial logic for concurrency [3] which also include spatial 
and separation aspects related to a concurrent calculus model. We aim to deeply 
study how to use such based-on resource models (not only trees) and related 
logics for program verification. Moreover, we will study how our results about 
theorem proving in BI logic [11] can be extended to handle distribution and 
mobility aspects. 
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Abstract. Transactions are commonly described as being ACID: All- 
or-nothing, Consistent, Isolated and Durable. However, although these 
words convey a powerful intuition, the ACID properties have never been 
given a precise semantics in a way that disentangles each property from 
the others. Among the benefits of such a semantics would be the ability 
to trade-off the value of a property against the cost of its implementa- 
tion. This paper gives a sound equational semantics for the transaction 
properties. We define three categories of actions, A-actions, I-actions and 
D-actions, while we view Consistency as an induction rule that enables 
us to derive system-wide consistency from local consistency. The three 
kinds of action can be nested, leading to different forms of transactions, 
each with a well-defined semantics. Conventional transactions are then 
simply obtained as ADI-actions. From the equational semantics we de- 
velop a formal proof principle for transactional programs, from which we 
derive the induction rule for Consistency. 



1 Introduction 

Failure, or rather partial failure, is one of the most complex issues in computing. 
By definition, a failure occurs when some component violates its specification: 
it has “gone wrong” in some serious but unspecified manner, and therefore rea- 
soning about it means reasoning about an unknown state. To cope with such a 
situation we use abstractions that provide various kinds of “failure-tight com- 
partment”: like water-tight doors on a ship, they keep the computation afloat 
and give us a dry place to stand while we try to understand what has gone 
wrong and what can be done about it. The familiar notion of address space in 
an operating system is one such: the address space boundary limits the damage 
that can be done by a misbehaving program, and gives us some confidence that 
the damage has not spread to programs in other address spaces. 

Transactions. The most successful abstraction for coping with failure is the 
transaction, which has emerged from earlier notions of atomic action [1]. The 
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most popular characterization of transactions is due to Haerder and Reuter [2], 
who coined the term ACID to describe their four essential properties. 

The “A” in ACID stands for all-or-nothing; it means that a transaction ei- 
ther completes or has no effect. In other words, despite failures, the transaction 
never produces partial effects. “I” stands for isolation; it means that the in- 
termediate states of data manipulated by a transaction are not visible to any 
other committed transaction, z.e., to any transaction that completes. “D” stands 
for durability; it means that the effects of a committed transaction are not un- 
done by a failure. “C” stands for consistency; the C property has a different 
flavour from the other properties because part of the responsibility for maintain- 
ing consistency remains with the programmer of the transaction. In contrast, 
all-or-nothing, isolation and durability are the system’s responsibility. Let us 
briefly explore this distinction. 

Consistency is best understood as a contract between the programmer writing 
individual transactions and the system that implements them. Roughly speak- 
ing, the contract is the following: if the programmer ensures the consistency of 
every individual transaction, and also ensures that the initial state is consistent, 
then the system will ensure that consistency applies globally and forever, despite 
concurrency and failure. Consistency is thus like an induction axiom: it reduces 
the problem of maintaining the consistency of the whole of a concurrent system 
subject to failure to the much simpler problem of maintaining the consistency of 
a series of failure- free sequential actions. One of the main results of this paper 
is to state and prove this folklore idea for the first time in a formal way (see 
Theorem 2 in section 4). 

The “ACID” formulation of transactions has been current for twenty years. 
During that time transactions have evolved, and have spawned variants such as 
nested transactions [3] and distributed transactions [4]. Yet, as far as we are 
aware, there has been no successful formalization of exactly what the A, D and 
I properties mean individually. In this paper we present such a formalization. 

We believe that this work has value beyond mere intellectual interest. First, 
various kinds of atomic, durable and isolated actions are routinely used by the 
systems community; this work illuminates the relationship of these abstractions 
to the conventional transactions (T-actions) of the database world. Second, we 
have hopes that by separating out the semantics of A-, I- and D-actions, and 
showing how they can be composed to create T-actions, we will pave the way for 
the creation of separate implementations of A-, I- and D-actions, and their com- 
position to create implementations of T-actions. To date, the implementations 
of the three properties have been interdependent. For example, implementing 
isolation using timestamp concurrency control techniques rather than locking 
changes the way that durability must be implemented. 



Background. Transactions promise the programmer that he will never see a fail- 
ure. In a 1991 paper [5], Black attempted to capture this promise using equiva- 
lence rules. Translated into a notation consistent with that used in the remainder 
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of this paper, the rules for all-or-nothing (1) and durability (2) were 

(a) II = (a) 4- skip (1) 

(a) ; = (a) (2) 

where (a) represents a transaction that executes the command a, \ \ represents 
parallel composition, represents a failure, and skip is the null statement. The 
alternation operator -|- represents non-deterministic choice, so the right-hand 
side of (1) is a program that either makes the same state transformation as (a), 
or as skip — but we cannot a priori know which. The intended meaning of (1) 
was that despite failures, the effect of the transaction (a) would either be all of 
(a) or nothing (skip). Similarly, equation (2) says that a failure occurring after 
a transaction has committed will have no effect on that transaction. 

These rules are not independent, however. For example, relaxing the all-or- 
nothing rule changes equation (2) as well as equation (1). It also turns out that 
It cannot be treated as a process (see section 2). Thus, the main contribution of 
the 1991 paper was the identification of an interesting problem, not its solution. 
The solution was not trivial: it has taken another twelve years! 

The reader might ask why skip is one of the possible meanings of (a) || If. 
The answer is that in practice this is the price that we must pay for atomicity: 
the only way that an implementation can guarantee all-or-nothing, in a situation 
where an unexpected failure makes it impossible to give us all, is to give us 
nothing. 

Implementations have taken advantage of this equivalence to abort a few 
transactions even when no failure actually occurs: even though it might have been 
possible to commit the transaction, the implementation might decide that it is 
inconvenient or too costly. For example, in systems using optimistic concurrency 
control, a few logically unnecessary aborts are considered to be an acceptable 
price to pay for increased throughput. 

In some centralized systems failure is rare enough that it may be acceptable to 
abort all active transactions when a failure does occur. However, in a distributed 
system, failure is commonplace: it is not acceptable to abort every computation 
in a system (for example, in the Internet) because one data object has become 
unavailable. Thus, we need to be able to reason about partial failures. 

Related work. Transactions originated in the database arena [6] and were even- 
tually ported to distributed operating-systems like Tabs [7] and Camelot [8], as 
well as to distributed programming languages like Argus [9], Arjuna [10], Avalon 
[11], KAROS [12] or Venari [13]. During this phase, transaction mechanisms be- 
gan to be “deconstructed” into simpler parts. The motivation was to give the 
programmer the ability to select — and pay for — a subset of the transaction 
properties. However, to our knowledge there was no attempt to define precisely 
what each property guaranteed, or how the properties could be combined. 

The “Recoverable Virtual Memory” abstraction of the Camelot system 
is an example of a less-than-ACID transaction. Recoverable Memory Mem- 
ory supported two kinds of transactions: all-or-nothing transactions (called 




An Equational Theory for Transactions 



41 



“no-flush” transactions), and all-or-nothing transactions that are also durable. 
Concurrency-control was factored out into a separate mechanism that the pro- 
grammer could use to ensure isolation. This gave the programmer a very flexible 
way of trading-off transaction properties for efficient implementations, but the 
meaning of the various properties was not rigorously defined and it is not clear 
what guarantees their combination would enjoy. Similarly, in Venari, the pro- 
grammer can easily define durable transactions, atomic transactions and isolated 
transactions, but the meaning of the combinations of these properties was not 
defined formally. 

Our equational semantics provide a way to reason about individual properties 
of less-than-ACID transactions and about the meaning of their composition. 

The ACTA formalism [14] was introduced to capture the functionalities of 
various transactional models. In particular, the aim was to allow the specifica- 
tion of significant events beyond commit and abort (useful for long-lived transac- 
tions) and to allow the specification of arbitrary transaction structures in terms 
of dependencies between transactions (read- from relations). The notation en- 
abled one to informally describe various transaction models, such as open and 
nested transactions, but did not attempt to capture the precise meaning of the 
individual transaction properties, nor was it used to study their composition. 

Interestingly, all modern transactional platforms we know of, including Arju- 
naTS [15], BEA Weblogics [16], IBM Webspheres [17], Microsoft MTS [18], and 
Sun EJB [19] provide the programmer with the ability to select the best variant of 
transactional semantics for a given application. Our equational semantics might 
provide a sound theoretical framework to help make the appropriate choice. 

In [20] the authors also formalize the concepts of crash and recovery by 
extending the 7r-calculus. They are able to prove correctness of the two-phase 
commit protocol, which can be used to implement transactions, but they do not 
model full transactions nor a fortiori give a consistency preservation theorem as 
we do. Contrary to our work which views serializability as the only meaning of 
the isolation property, [21] defines the notion of semantic correctness: a schedule 
is considered correct if its effect is the same as some serializable schedule, and 
not only if it is serializable (the concept of effect is modeled using Hoare logic). 

Overview. Our technical treatment is organized as follows. We first define a 
syntax for pre-processes, in which actions are combined using sequential, parallel 
and non-deterministic composition. With a simple kind system, we select from 
these pre-processes a set of well-formed processes. We then present a set of axioms 
that define an equivalence relation on well-formed processes and discuss the 
rationale underlying these axioms. 

By turning the axioms into a rewriting system modulo some structural equal- 
ities, we prove that every process has a unique canonical form (Theorem 1 ; 
sec. 4). Canonical forms contain neither embedded failures nor parallel compo- 
sition, and thus they allow us to capture the semantics of an arbitrary process 
in a simple way. 

If we restrict further the shape of a process so that it is built exclusively 
from locally consistent sequences, then we can prove that its canonical form is 
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also built from consistent sequences (Theorem 2 ; sec. 4). Hence, we can show 
that in our model of transactions, local consistency implies global consistency. 

For space reasons this document does not contain the proofs of theorems. 
They can be found in a compagnon document [22] together with more detailed 
explanations about this work. 



2 Processes 



We start with the definition of several interesting classes of processes. 

As is usually the case for programming languages, we introduce the set of 
processes in two stages: we first define a set of syntactically correct objects that 
we call pre-processes and we then consider a semantically correct subset whose 
elements are called well-formed processes or, more briefly, processes. 

The syntax of pre-processes is as follows. 



P.Q 



a,b,c, . . . 


primitive action 


{P)a 


all-or-nothing action 


{P)d 


durable action 


{P)i 


isolated action 


P;Q 


sequential composition 


P\\Q 


parallel composition 


P -\- Q 


non-deterministic choice 


skip 


null action 


crash{P) 


one or more crashes and recoveries during P 



The operators have the following precedence: ; binds more than j j which binds 
more tightly than -I-. 



Primitive actions. A primitive action represents an access to a shared resource. 
A typical primitive action might be the invocation of a method on a global 
object. We use a different symbol a,b, . . . for each primitive action. 



Decomposed transactions. Three kinds of brackets are used to group actions: 

{P)a processes are either executed completely or not at all. 

{P)i processes are isolated; a parallel execution of such processes has always 
same effect as some sequential execution of the same processes. 

(P) D processes are durable; once completed, their effect cannot be undone by 
subsequent failures. 

There is no fourth kind of bracket for consistent actions; as discussed in 
Section 1, consistency is a meta-property that needs to be established by the 
programmer for individual sequences of actions. Also missing is a kind of bracket 
corresponding to classical, full-featured transactions. There is no need for it, since 
such transactions can be expressed by a nesting of all-or-nothing, durable, and 
isolated actions. That is, we will show that the process {{{P)a)d)i represents P 
executed as a classical transaction. 
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Formal reasoning about failures requires that we delimit their scope. In our 
calculus we use the action brackets for this purpose also. Thus, a crash/recovery 
event inside an action should be interpreted as a crash/recovery event that is 
local to the memory objects accessible from that action. For instance, in the 
action {{P)i || {Q ; the crash/recovery will affect only the nested 

action containing Q, and not the action containing P nor the top-level action. 
In contrast, a crash/recovery event occurring in some action P will in general 
affect actions nested inside P. 



Failures. The term crash{P) represents the occurrence of one or more crash/- 
recovery events during the execution of process P. Each crash/recovery event 
can be thought of as an erasure of the volatile local memory of the process P (the 
crash) followed by the reinitialization of that memory from the durable backup 
(the recovery). 

We represent failures that occur independently of any other action as if they 
occurred during a null action. We use the following shorthand notation to rep- 
resent such a single failure event: 

j,t = crash(skip) 

One might wonder why we did not instead start by taking the symbol Ets a 
primitive and letting crash{P) be an abbreviation for II This was in fact 
our initial approach, but we encountered problems that led us to the interesting 
observation that crash/recovery is not an isolated action and that it cannot 
therefore be composed in parallel with other processes. This is explained in 
more detail in Section 2. 

Note also that we consider a crash/recovery to be atomic. This means that 
we do not permit the crash phase to be dissociated from the recovery phase. We 
exclude, for instance, the possibility that another action can occur between a 
crash and its associated recovery. 

Well-formed processes . We have a very simple notion of well-formedness in our 
calculus. The only restriction that we impose is that a process which is to be 
executed in parallel with others must be interleavahle. Roughly speaking, an 
interleavable process is a term of the calculus that is built at the outermost level 
from actions enclosed in isolation brackets. These brackets define the grain of 
the interleaving. 

We define the set of (well-formed) processes and the set of interleavable pro- 
cesses by mutual induction. The grammar of we 11- formed processes is almost the 
same as the grammar for pre-processes except that now parallel composition is 
allowed only for interleavable processes. 

An important property of well-formed processes is that crash(P) is not in- 
terleavable. So, for example. 



(^)/ll(Q)/ll it 
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is not a well-formed process. We exclude this process for the following reason: 
seen by itself, (P)/ || (Q)/ is equivalent to some serialization of P and Q — 
either (P)/ ; (Q)/ or {Q)i ; (P)/. Applying a similar serialization law to the 
component, one could be tempted to conclude that the crash will happen pos- 
sibly during P or during Q, but certainly not during both P and Q. However, 
such a conclusion is clearly too restrictive, since it excludes every scheme for 
implementing transactions in parallel, and admits as the only possible imple- 
mentations those which execute all isolated actions in strict sequence. 

Canonical processes. Informally, a process in canonical form consists of a non- 
deterministic choice of one or more alternatives. Each alternative might start 
with an optional crash/recovery and is then followed by a sequence of primitive 
actions, possibly nested inside atomic, isolated or durable brackets. Note that 
a crash recovery at the very beginning of a process has no observable effect, as 
there are no actions that can be affected by it. The existence of canonical forms 
gives us an important proof principle for transactions. To prove a property P(P) 
of some process P, we transform P into an equivalent process C in canonical 
form, and prove instead P(C'). The latter is usually much easier than the former, 
since processes in canonical form contain neither embedded failures nor parallel 
compositions. 

Locally consistent processes. We now define a class of well-formed processes that 
are “locally consistent”, i.e., that are built out of sequences of primitive actions 
assumed to preserve the consistency of the system. We make this intuition clearer 
in the following. 

To define consistency without talking about the primitive operations on the 
memory, we assume that we are given a set of finite sequences of primitive 
actions. The elements of this set will be called locally consistent sequences. In- 
tuitively, a locally consistent sequence is intended to preserve the consistency of 
the global system if executed completely and in order, but will not necessarily do 
so if it is executed partially or with the interleaving of other primitive actions. 
So with respect to a given set of locally consistent sequences, a locally consistent 
process is a well-formed process in which every occurrence of a primitive action 
must be part of a locally consistent sequence inside atomic brackets. 

3 Equational Theory 

We now define an equivalence relation on processes that is meant to reflect our in- 
formal understanding of all-or-nothing actions, isolation, durability, concurrency 
and failure. This equivalence relation will be defined as the smallest congruence 
that contains a set of equality axioms. We are thus defining an equational theory. 

Structural equalities. The first set of equality axioms are called structural equal- 
ities because they reflect obvious facts about the algebraic structure of the com- 
position operators and skip. 
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~ Parallel composition (||) is associative (1), commutative (2) and has skip as 
identity element (3). 

— Sequential composition (;) is associative (4) and has skip as right identity 
(5) and left identity (6). 

~ Alternation (+) is associative (7), commutative (8) and idempotent (9). 

— Furthermore, alternation distributes over every other operator, capturing the 
idea that a choice made at a deep level of the program determines a choice 



for the entire program. 

{P + Q);R = P;R + Q;R (10) 

P;{Q + R) = P;Q + P-,R (11) 

{P + Q)\\R = P\\R + Q\\R (12) 

{P + Q)k = {P)k + {Q)k kG{A,D,I} (13) 

crash(P + Q) = crash(P) + crash(Q) (14) 

— An empty action has no effect. 

(skip)fc = skip ke{A,D,Ij (15) 



Interleaving equality. Isolation means that a parallel composition of two isolated 
processes must be equivalent to some sequential composition of these processes. 
This corresponds to the following interleaving law. 

{P)i ;P'\\ {Q)i ; Q' = {P)i ; (P' || {Q)i ; Q') + 

{Q)i-{Q'\\{P)i-P') (16) 

Global failure equalities. When we write crash{P) we indicate that one or more 
failures, each followed by a recovery, will happen during the execution of P. The 
equalities below make it possible to find equivalent processes that are simpler in 
the sense that we know more accurately where failures can possibly take place. 

— If failures occurred during the sequence P ; Q they might have occurred 
during P, or during Q, or during both P and Q: 

crash(P ; Q) = crash(P) ; crash(Q) (17) 

According to our intuitive understanding of crash(P), namely, one or more 
failures during P, equation (17) is not very natural. It seems that we might 
expect to have two or more failures on the right-hand side, whereas there 
is only one or more on the left-hand side. Maybe it would have been more 
natural to write 

crash{P ; Q) = P ; crash(Q) + crash(P) ; Q + crash{P) ; crash(Q) 

In fact, the above equality holds in our theory, because P ; crash(Q) and 
crash(P) ; Q can be shown to be “special cases” of crash(P) ; crash(Q). 
(We say that Q is a “special case” of P if there exists a process R such that 
p = Q + R.) 
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— Based on our informal definition of the operator crashQ it is natural to see 
every process crash(P) as a fixed point of crashQ: contaminating an already- 
failed process with additional failures has no effect. 

crash(crash(P)) = crash(P) (18) 

— A crashed primitive action may either have executed normally or not have 
executed at all. But in each alternative we propagate the failure to the left 
so that it can affect already performed actions. 

crash(a) = 4_t ; a + it (19) 

— A crashed A-action behaves in the same way, in accordance with its informal 
“all or nothing” meaning. 

crash({P}A) = it ; {P} A + it (20) 

— A failure during a D-action is propagated both inside the D-brackets so that 
it can affect the nested process, and before the D-brackets so that it can 
affect previously performed actions. 

crash{{P)n) = it ; {crash{P))D (21) 

— Since we consider only well- formed processes, we know that a term of the 
form crash(P) cannot be composed in parallel with any other term. Hence, 
isolation brackets directly inside crashQ) are superfluous. 

crash((P)j) = crash(P) (22) 

Failure event equalities. We will now consider the effect of a failure event (it) 
on actions that have already been performed. 

— If the failure was preceded by a primitive action or an A-action, then either 
the effect of those actions is completely undone, or the failure did not have 
any effect at all. In either case we propagate the failure event to the left so 
that it can affect previous actions. 

a j it = it a + it (23) 

{p)a ; it = it ; {p)a + it (24) 

— A crash/recovery can in principle act on every action that precedes it. But 
a durable transaction that has completed becomes, by design, resistant to 
failure. 

{p)d ; it = it ; {p)d (25) 

~ If an I-action is followed by a failure event, we know that parallel composition 
is impossible, so the isolation brackets are again superfluous. 



{P)l ; it = P ; it 



( 26 ) 
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Nested failure equalities. The effects of a failure are local. This means that a 
failure inside some action cannot escape it to affect outer actions. Furthermore, 
a crash/recovery at the beginning of an action has no effect on the action’s state, 
because nothing has been done yet. We can safely ignore such crash/recoveries 
if the enclosing action is isolated or durable: 

(it ;P)d = {P)d (27) 

(it ; P) I = {P) I (28) 

By contrast, a crash/recovery at the beginning of an atomic action will abort 
that action: 



(it ; P)a = skip (29) 

This is realistic, since a failure that occurs after the start of an atomic action will 
have the effect of aborting that action, no matter whether the action has already 
executed some of its internal code or not. This is also necessary from a technical 
point of view, since a crash/recovery at the beginning of an atomic action might 
be the result of rewriting a later crash/recovery using laws (19), (20), (23) or 
(24). In that case, the sequence of sub-actions inside the {■)a may be only partial 
and therefore must be dropped in order to satisfy the all-or-nothing principle. 

Admissible Equalities. Using the equational theory, we can show that the fol- 
lowing four equalities also hold. 

at = 4t 
crash (It) = It 

crash{P) ; |t = crash{P) 

It ; crash{P) = crash(P) 

The first two equalities are simple consequences of laws (6), (17) and (18). 

The last two equalities are not directly derivable from the given axioms. How- 
ever, one can show that they hold for all processes that result from substituting 
a concrete well-formed closed process for the meta-variable P. 

The “harmless case” property. Among the possible effects of a failure there is 
always in our system the case where the crash/recovery has no effect. More 
precisely, for every process P, it holds that || ; P is a special case of crash{P). 
This conforms to the usual intuition of failure: we know that something has gone 
wrong but we do not know exactly what the effects have been, so we must also 
consider the case where nothing bad occurred. 

4 Meta-theoretical Properties 

We now establish the two main theorems of our calculus. The first is that every 
process is equivalent to a unique process in canonical form. Since canonical forms 
contain neither parallel compositions nor failures (except at the very beginning) , 
this gives us a theory for reasoning about decomposed transactions with failures. 
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Theorem 1 (Existence and Uniqueness of Canonical Form). For each 
well-formed process P there is one equivalent process in canonical form. Fur- 
thermore this later process is unique modulo associativity, commutativity and 
idempotence of {-p) (axioms (7), (8), (9)), associativity of {;) (axiom (4)) and 
the simplifications involving skip (axioms (5), (6), (15)). We call this process the 
canonical form of P . 

The proof of this theorem is based on a rewriting system modulo a set of 
structural rules which is shown equivalent to the equational theory. Canonical 
forms are then normal forms (irreducible terms) of the rewriting system. 

The second theorem is that the reduction of a process to its canonical form 
preserves its consistency. This theorem guarantees that our equational calculus 
conforms to the usual behavior of a transaction system, which requires that local 
consistency of transactions implies global consistency of the system. 

Theorem 2 (Preservation of Consistency). The canonical form of a locally 
consistent process is also a locally consistent process. 

Conclusion 

This paper presents an axiomatic, equational semantics for all-or-nothing, iso- 
lated, and durable actions. Such actions may be nested, and may be composed 
using parallel composition, sequential composition and alternation. Traditional 
transactions correspond to nested A-D-I-actions. The semantics is complete, in 
the sense that it can be used to prove that local consistency of individual trans- 
actions implies global consistency of a transaction system. 

The work done in this paper could be used to better understand the interplay 
between actions that guarantee only some of the ACID properties. These kinds of 
actions are becoming ever more important in application servers and distributed 
transaction systems, which go beyond centralized databases. 

We have argued informally that our axioms capture the essence of decom- 
posed transactions. It would be useful to make this argument more formal, for 
instance by giving an abstract machine that implements a transactional store, 
and then proving that the machine satisfies all the equational axioms. This is 
left for future work. 

Another continuation of this work would consider failures that can escape 
their scope and affect enclosing actions in a limited way. A failure could then be 
tagged with an integer, as in ffn, which would represent its severity, i.e., the 
number of failure-tight compartments that the failure can go through. 

The process notion presented in this paper is a very high-level abstraction 
of a transaction system. This is a first attempt at formalizing transactions, and 
has allowed us to prove some properties which are only true at this level of 
abstraction. Now it is necessary to refine our abstraction in order to take into 
account replication, communication and distribution. We hope to find a con- 
servative refinement, i.e., a refinement for which the proofs in this paper still 
work. 
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Abstract. Conrcelle introduced the study of regular words, i.e., words 
isomorphic to frontiers of regular trees. Heilbrunner showed that a non- 
empty word is regular iff it can be generated from the singletons by 
the operations of concatenation, omega power, omega-op power, and the 
inhnite family of shufHe operations. We prove that the nonempty regular 
words, equipped with these operations, are the free algebras in a variety 
which is axiomatizable by an infinite collection of some natural equations. 



By “word” we understand a labeled linear order, extending the familiar notion. 
Conrcelle [Cour78] introduced the study of such words (“arrangements”, in his 
terminology) . He showed that every finite or countable word is isomorphic to the 
frontier of a (complete) binary tree, where the linear order on the leaves of the 
tree is the lexicographic order. He introduced several operations on such words, 
including concatenation (or product), omega and omega-op powers. He proved 
that initial solutions of finite systems of fixed point equations 

Xi = Ui, f = 1,2, . . . , fc, (1) 

(where x\,. . . ^Xk are variables and the Ui are finite words on the letters in a 
set A and the variables) are isomorphic to frontiers of regular trees. Further, 
he showed that the solutions of certain kinds of systems can be expressed by 
“quasi-rational” expressions, formed from single letters in A using the operations 
of concatenation, omega and omega-op power. Conrcelle asked for a complete 
set of axioms for these operations. In [BlChoOl], a complete set of axioms for 
just the concatenation and omega power operation on words was given, and in 
[BlEsOSa] Courcelle’s question was answered. 

We call a word which is isomorphic to the frontier of a regular binary tree 
a regular word. Several results on regular words have been obtained by Heil- 
brunner [HeilSO] , Thomas [Th86] , and the authors [BlEs03] . A summary of some 
of the results of Conrcelle and Heilbrunner is given below in Theorem 1 . Thomas 
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gave an algorithm to determine when two terms denote isomorphic words. His 
algorithm is based on Rabin’s theorem on automata for infinite trees. 

Heilbrunner discussed several identities involving the terms with both Cour- 
celle’s operations, as well as the shuffle operations, but did not obtain a com- 
pleteness result. In the present paper we provide a complete system of axioms 
for the terms. This proves that the regular words are the free algebras in the 
variety defined by these equations. We show also that the equational theory of 
this variety is decidable in polynomial time. In the full version of the paper we 
will also show the impossibility of a finite axiomatization. The completeness the- 
orem and the corresponding complexity result. Theorem 3, provide a solution to 
a problem that has been open for over twenty years. 

1 Preliminaries 

We assume some familiarity with linearly ordered sets (T, <), and mor- 
phisms of linearly ordered sets (i.e., order preserving functions). Some com- 
mon linearly ordered sets are Z, the usual ordering on all integers; Q, the stan- 
dard ordering on all rationals; w, the linearly ordered set of the nonnegative 
integers, and uj°p, the linearly ordered set of the negative integers. For n G w, 
[n] denotes the n element set {1,2,... ,n}, ordered as usual. 

A linear order (L, <) is quasi-dense if there is an injective morphism 

(Q,<) ^ {L,<). 

If (L,<) is not quasi-dense, it is scattered [Ro82] or discrete [BlEsOS]. 

An interval of L is a subset I oi L such that \i x < y < z in L, and if 
x,z € I, then y G I. (Thus, in particular, the empty set is a interval, as is any 
singleton subset.) 

Suppose that (L, <) is a linear order. A basic open interval in (L, <) is 
(p,q) = {x : p < X < q}, for p < q in L, or p = —oo or q = oo. The notation 
(p,oo) means {x : p < x}; (— oo,g) = {x : x < q}, and (—00,00) = L. An 
interval I is open if for each point x G I there is a basic open interval (p, q) with 
p < X < q and (p, q) C I. (Here, p may be —00, and q may be 00.) An interval I 
is dense if I has at least two distinct points and for all x,y G I , if x < y, then 
there is some z G I with x < z < y. 

2 Words 

A word {Lu,<,u, A) (a “generalized word” in Thomas [Th86], “arrangement” 
in Courcelle [Cour78]) consists of a linearly ordered set (L^, <), a set A of labels, 
and a labeling function u : Lu ^ A. The range of the labeling function is called 
the alphabet of the word, denoted alph('u). 

Usually, we use just the labeling function to name the word, and let (T„, <) 
denote the underlying linear order of the word u. We say that m is a word on A, 




52 S.L. Bloom and Z. Esik 

over the linear order (L«,<). The linear order (T„,<) is the underlying linear 
order of u. When is empty, we have the empty word on A, written 1. 

Suppose that u and v are words over A with underlying linear orders (L„, <) 
and {Ly, <), respectively. A morphism h : u ^ v is a morphism h : {Ly, <) — >■ 
{Lv, <) which preserves the labeling: 

u{x) = v{h{x)), X G Ly. 

Thus, for any set A, the collection of words on A forms a category. Two words 
u,v on A are isomorphic when they are isomorphic in this category, i.e., when 

there are morphisms h : u ^ v, g : v ^ u such that u \ v u and v u \ v 
are the respective identities. We write 

u = v 

to indicate that u and v are isomorphic. We usually identify isomorphic words. 

If M, V are words on A, we say is a subword of u if Ly is an interval of 
(T«, <) and for p & Ly, v{p) = u{p). 

3 Some Operations on Words 

We will define the collection of “regular operations” on words by means of word 
substitution. First, we define sum and generalized sum [Ro 82 ] for linear orders. 

Definition 1 (sum). Suppose that L\ and L2 are linear orders with disjoint 
underlying sets. Then Li + L2 is the linear order on the union of the two sets 
defined by: 

X < y {x <y in Li or in L 2) or x € L\ and y € L2. 

Definition 2 (generalized sum). Suppose that (L,<) is a linear order, and 
for each x € L, let {K^, <) be a linear order. The ordering 

x^L 

obtained by substitution of for x G L, is defined as follows: the underlying 
set is the set of pairs {k,x) with x G L and k G K,y ordered by: 

{k,x) < {k' ,x') < 1 =^ X < x' or {x = x' and k < k'). 

Definition 3 (substitution). Let u be a word with alph(rt) Q A = 
{ai,... ,a„}, and let Va, be a word on the set B, for each i G [n]. The sets 
A, B need not be the same. We define w = u{ai/vai, . . . ,a„/z;o„), the word on 
the set B obtained by substituting for each occurrence of Oi in u, as follows. 
The underlying order of w is the linear order ^u{x)> defined just above, 

labeled as follows: 



w{k,x) := Vy(^,y){k), a; G L„, fc G 
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Define the following words on the countable set {oi, 02 , . . . }. 

— c := 0102 , the word ([2],<,u) with u{i) = ai. 

— Puj := oioi . . . , the word whose underlying linear order is u), each point of 
which is labeled oi. 

~ r^^op := . . .oiOi, the word whose underlying linear order is uj°p, each point 
of which is labeled Oi . 

— For 1 < n < uj, pn is the word whose underlying linear order is Q, every 
point labeled by some o^, i G [n], and between any two points q < q' in Q, 
for each j G [n] there is a point labeled aj. There is a unique such word, up 
to isomorphism. See [Ro82], pp 116. 

We define the regular operations of composition u-v, omega power , omega- 
op power and shuffle, [wi, . . . , Un]^. 

Definition 4 (regular operations). For any words u,v,u\, . . . , n>l, on 
A: 



u ■ V := c{a\/u, 02 /^) 

■= Puj{ai/u) 

” := rujop{ai/u) 

[ui, . . . ,u„]'' := pn{ai/ui, . . . ,anlun). 

There is one shuffle operation for each positive integer n, 

(mi, ... ,M„) [ui, . . . 

Also, pn = [oi,--- ,On]^- (If the binary shuffle operation were associative, we 

could restrict ourselves to just one shuffle operation. Unfortunately, it is not.) 

Definition 5 (terms). Let A be a fixed set. 

1. A term on the set A is either some letter a G A, or t ■ t' , , or 

[t\,. . . , tn]^, where t,t' ,t\, . . . , are terms on A. 

2. An discrete term is a term that has no occurrence of the function symbol 

3. The height of a term t, denoted ht(t), is the maximum number of nested ^ , 
“ and ^ operations in t. 

4 . When t is a term on the alphabet A, we write |t| for the word on A denoted 
by t. More precisely, for a € A, |a| is a singleton set, labeled a, and, using 
induction, we define 
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5. An equation t = t' between two terms on A is valid, and t and t' are equiv- 
alent, if |t| = \t'\. 

Terms of height zero are called “finite” . Those of positive height are “infinite” . 

A binary tree on A consists of an finite or infinite binary tree (considered as 
a nonempty, prefix closed set of binary strings, as usual) and a labeling function 
mapping the set of leaves of the tree to A. The leaves, when ordered lexico- 
graphically, form a linearly ordered set. When T is a binary tree on A and u 
is a vertex, the subtree rooted at vertex u is also a binary tree on A. A tree is 
regular if up to isomorphism it has a finite number of subtrees. The frontier of 
a tree on A is the A-labeled linearly ordered set of the leaves. 

Definition 6 (regular word). A word u on A is called regular if it is iso- 
morphic to the frontier of a regular binary tree on A. 

Note that the alphabet of each regular word is finite. The following theorem 
summarizes some of the results in Heilbrunner [HeilSO] and Courcelle [Cour78]. 

Theorem 1. The following are equivalent for a word u on A. 

— u is regular. 

— u is a component of the initial solution to a system of fixed point equations 
of the form (1) above. 

— u is either the empty word, or belongs to the least collection of words con- 
taining the singletons a, a € A, closed under the regular operations, i.e., 
when there is a term t on A with u = |t|. 

Also, if t is a term on A, then |t| has a scattered underlying linear order iff t is 
a discrete term. □ 

We single out an important class of terms. 

Definition 7 (primitive term). A term on A is primitive if it has one of 

the following forms, for oi, . . . , a„, bi, . . . ,bk,ci, . . . ,Cmd A: 

— Oi • • • On, n > 0, 

— oi • • • a„(6i • • • bk)^ , n > 0, k > 0, 

— (cm’‘‘Ci)“^ ai---a„, n > 0, m > 0, 

— (cm ■ ■ ■ ci)“°*’ai • • • a„(6i • • • &fc)“, m,k > 0, n > 0 

— [oi,... n > 0 

— ai[ai , . . . , On]^ , some n > 0, i G [n], 

— [oi, . . . , some n > 0, i G [n], 

— ai[ai, . . . , an]^aj , some n > 0, i,jG[n\. 

We say that a word m on A is primitive if u is isomorphic to |t|, for some 
primitive term t. 

Later, to abbreviate several possibilities, we sometimes write 



(a*)[ai,... ,a„]'' 
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to mean either • [oi, . . . , a„]^ or [ai, . . . , Similarly, we write 

[ai, . . . ,a„]''(oj) 

to mean either [ai, . . . , a„]’* or [oi, . . . , a„]’* • Uj. Last, we write 

(oi)[ai, . . . ,a„]''(oj) 

to mean either • [oi, . . . , a„]^ • aj or (oi)[ai, . . . , a„]^ or [oi, . . . , a„Y'{aj). 



4 Blocks 

We will partition a word into subwords of two kinds: minimal successor-closed 
subwords and maximal “uniform” subwords. 

Definition 8 (uniform word). Suppose that u is a word on the set A and let 
X he a nonempty subset of A. A subword v of u is X -uniform if 

— Ly is a open, dense interval of L^, and 

— for each p € Ly, the label of p, u{p), belongs to X, and 

— for each p,q G Ly and a € X, if p < q, then there is some r € Ly with 
p < r < q and u{r) = a. 

We say a subword of u is uniform if it is X -uniform, for some X. 

Remark 1. Note that if v is an X-uniform subword of u, then alph(v) = X. 
Thus, if V is a uniform subword of u, then there is a unique X such that v is 
X-uniform. Also, if X = {oi, . . . ,a„}, then any X-uniform subword u of m is 
isomorphic to either 

[oi,... ,a„]'', a*[ai,... ,a„]’*, [oi,... or 

for some i,j G [n]. 

We say that a subword v of the word u is a maximal uniform subword of u if 
V is uniform, and whenever v Q v' , for a uniform subword v' of u, then v = v' . 

Proposition 1. Any uniform subword of the word u is contained in a unique 
maximal uniform suhword. 

We introduce a modification of the usual notion of a “successor” in a linear 
ordered set. 

Definition 9. Suppose that u is a word and p,q € Ly. We write q = S{p) if 

1. p < q and there is no w & with p < w < q, and 

2. q does not belong to any uniform subword of u. 



Similarly, we write q = P{p) if 
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1. q < p and there is no w € with q < w < p, and 

2. q does not belong to any uniform subword of u. 

For any word u and any point p G L^, either there is some q = S{p), or not. If 
so, we say S{p) exists; similarly, if there is some q = P{p), we say P{p) exists. 
For n > 0, define: 

So{p) ■=P 

Sn+i{p) ■■= S{Sn{p)), if both Snip) &nd 5(S'„(p)) exist 
S'_(„+i)(p) := P{S_nip)), if both 5'_„(p) and P(5_„(p)) exist. 



The set 



{S'fc(p) : k G Z, S'fe(p) exists} 



is a nonempty interval, since Sgip) always exists. 



Definition 10. Suppose that u is a word and v is a subword of u such that no 
point of V is contained in a uniform subword. We say that v is s-closed 
if for each p in v, if S{p) exists then S{p) is also in v, and similarly for Pip). A 
minimal s-closed subword of u is a nonempty s-closed subword that contains 
no other nonempty s-closed subword. 



Proposition 2. Suppose that u is a word. A subword v of u is a minimal s- 
closed subword of u iff for some point p G which does not belong to any 
uniform subword, v is the collection of points {Skip) ■ ^ G Z, S'fc(p) exists}, 
ordered and labeled as in u. 

We now show how any word may be subdivided (by a particular “condensa- 
tion” [Ro82]), into blocks. 

Definition 11 (blocks). A block of the word u is either a maximal uniform 
subword of u, called a dense block, or a minimal s-closed subword of u that 
does not contain any point belonging to a uniform subword, called a scattered 
block. 



Proposition 3. If u is a word, each point p in belongs to a unique block 
Bl(p); the blocks of u form a partition of L^, and they are linearly ordered by the 
relation: 

Bl(p) < B\iq) <1=^ X < y, all x G Bl(p), y G Bl(g). □ 

5 The Axioms 



In this section we list the axioms used in our completeness theorem in Section 
7. We divide the axioms into several groups. 
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Definition 12 (discrete axioms). 

{x ■ y) ■ z = X ■ {y z) 

{x-yT = x-{yxr 

(x-y) = (yx) -y 

{x^Y = x^, n>2 

n>2 

We note two consequences of the discrete axioms. 



to U. 

X • X = X 

U 

X • X = X 



( 2 ) 



The remaining axioms concern the shuffle operation. We call the first group the 
logical axioms. 

Definition 13 (logical axioms). 

5 ■ ■ ■ 5 ^/(n)] [^1 5 ■ ■ ■ 5 ^p] 5 

where / : [n] — >■ [p] is any set-theoretic surjection. 

The terminology comes from Eilenberg and Elgot [EiEl], who call a “logical 
function” XP — >■ X” one which maps (xi,... ,Xp) to (xj(i),... ,xj(„)), where 
/ is a function [n] — >■ [p\. The logical axioms say that the shuffle operation 
[oi, . . . , a„]^ is a function whose value is determined by the set {oi, . . . , a„}, 
not the sequence (oi, . . . ,a„); for example, using these axioms one may derive 
the facts that [a, a, b]^ = [b, a, b]^ = [a, b]^ = [b, a]''. 

The remaining axioms show how the shuffle operation interacts with the 
composition, omega, omega-op operations, and with itself. 

Definition 14 (composition/shuffle axioms). For each i G \p], 

[xi, . . . ,Xp]’' • [xi, . . . ,Xp]’' = [xi, . . . ,Xp]'' • Xi ■ [xi, . . . ,Xp]’' = [xi, . . . ,Xp]*'. 



Definition 15 (omega/shuffle axioms). For each i G [p], 



([Xi,... ,Xpf) 



UJ 



([xi, . . . ,Xp]’' • Xi)“" = [xi, . . . ,Xp]’'. 



Definition 16 (omega-op/shuffle axioms). For each i G [p], 



([xi, . . . ,Xp]’')“ " = (xi • [xi, . . . ,Xp]'')“" ’’d = [xi, . . . ,Xp] 



Definition 17 (shuffle/shuffle axioms). 

[ui, . . . ,Mfc,xi, . . . = [xi, . . . ,Xp]’’, fc > 0, s > 0, (3) 

where in (3), the terms y are letters in the set {xi, . . . ,Xp}, and each term Vj 
is one of the following: 

[xi, . . . ,Xp]’', Xj[xi, . . . ,Xp]'', [xi, . . . ,Xp]''xj, or Xi[xi, . . . , Xp]’'xj. 
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Note that a special case of the shuffle/shuffle axioms is the identity 

([xi,... ,Xp]"’)^ = [xi,... 

Let Ax denote the collection of all axioms. It should be clear that each axiom 
is valid. 

Proposition 4 (Soundness). If Ax\- s = t then |s| = |t|. □ 

Remark 2. The shuffle/shuffle and composition/shuffle axioms were discussed in 
[HeilSO], as were some of the discrete axioms. 

6 Proper Terms 

We introduce the notion of “separated” and “overlapping words” . 

Definition 18. We say the ordered pair of words u, v is separated if any block 
of the product uv is either a block of u or a block of v. If u,v is not separated, 
we say the pair u, v is overlapping. Similarly, we say that a pair of terms r, I 
is separated or overlapping when the words |r|, |^| are. 

For example, if m = [a,b\^b and v = [&, c]^, then u,v is overlapping, as is t6 = 
[a, b]^, V = b[b, c]**. 

Definition 19 (proper term). A term t on A is proper if either t is primitive, 
or t is not primitive but one of the following cases holds. 

1. t = {t\ - ■ • t„), n >2, and 

(a) each term t\,. . . ,tn is proper, and 

(b) for every i < n, either is separated, or i < n — 1, and 

ti = [oi, . . . , ti+i = c, and U +2 = [bi, ■■ ■ , bk^ , where oi, . . . , a„, 
bi, . . . ,bk and c are letters, {a\,. . . ,a„} yf {&i , . . . ,bk} and 

c G {oi, ... , a„} n {bi, ... , bk}, or 

2. t = (ti)“, and t\ ■ t\ is proper, or 

3. t = (ti)“ , and t\ • t\ is proper, or 

4 . t = [ti,... and each ofti,... ,tk is proper, and, furthermore, the 

subterms ti, . . . ,tk cannot be divided into two sets Ui, . . . , u„, Vi, . . . , Vm, 
n > 0, m > 0, where there are distinct letters a\,... ,Op, such that the 
terms Ui are letters in the set {ai, . . . , Op}, and each term Vj is of the form 

{a^)[al,... ,ap]'^{aj). 

If a term t is proper, either t is primitive and |t| has just one block, or the 
blocks of |t| may be constructed from the subterms of t. 

Proposition 5. For each term t on A there is a proper term s on A such that 
Ax h t = s and ht(s) < ht(t). 

This proposition is one of the crucial elements in our argument and requires a 
rather long proof. 
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7 The Completeness Theorem 

We will define a “condensation” u of a regular word u, obtained by condensing 
the blocks of u to points, i.e., each block of u is an appropriately labeled point 
of u . 

Definition 20. Let D = D{A) he a set that is in bijective eorrespondence with 
the primitive words on A, so that D has a distinct letter for each primitive 
word u on the alphabet A. Suppose that u is a word on A all of whose blocks are 
primitive words. We define the word 

u 

as the word on D whose underlying linear order consists of the blocks of u, 
ordered as indicated above. For p € the label of the point Bl(p) is dy, where 
V is the subword Bl(p) ofu. 

For example, if the word u is denoted by the proper term ((a • • [a, b]^, 

the word u will be denoted d', where d, d' are letters in a new alphabet. 

The operation m i— >■ ti is defined for all words whose blocks are denoted by 
primitive terms. 

Proposition 6. A word u on A is regular iff the word u on D is defined and 
regular. □ 

Thus, in particular, each block of a regular word is a primitive word. We note 
that this condensation is preserved and reflected by isomorphisms. 

Proposition 7. If u, v are regular words on A, then 

u = v u = V . 

We make essential use of “proper terms” in the following proposition. 

Proposition 8. For each proper term t over A, we may construct a term t 
over D in such a way that |t| = |t| , and if t is infinite, ht(t) < ht(t). Further, 
there is a term morphism a from the terms on D to the terms on A such that 
Ax h a(i) = t. □ 

Theorem 2 (Completeness Theorem). For terms t,s over the alphabet A, 

|s| = |t| Ax h s = t. 

We need prove only |s| = |t| Ax h s = t. By Proposition 5, we may 

assume that both s,t are proper. The proof of completeness uses induction on 
the maximum height of the terms s,t. If this height is 0, both terms are finite, 
and we need only the associativity axiom, (2). Assuming now that the maximum 
height is at least one, we argue as follows. By Proposition 7, 
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by Proposition 8 and induction, 



Ax h cr(s) 



a{t), 



by standard equational logic, 



Ax\- s = t, 



by Proposition 8. 



□ 



8 The Decision Algorithm 

Consider the following problem. Given two terms s,t on A, is |s| = \t\? 

The version of this problem for discrete terms was raised by Courcelle 
[Cour78] , and the general question was posed in [Heil80] . We recall that Thomas 
[Th86] showed that this problem is decidable, using Rabin tree automata. How- 
ever, his method did not provide an explicit upper bound on the complexity of 
this problem. The condensation method used to prove the completeness theorem 
gives a polynomial upper bound. 

Theorem 3. There is an 0{n^)~ algorithm to decide if an equation s = t between 
terms on A is valid, where n is the total number of symbols in the terms s,t. 

9 Adding 1 and Reverse 

The underlying order of the reverse of a word u is (L„, >), i.e., the reverse 
of (Tii,<). The labeling function of the reverse is the same as that of u. We 
may axiomatize the enrichment of the regular operations by the addition of the 
empty word and the reverse operation, by adding the following axioms Rev. 



{xT = X ( 4 ) 

{x ■ yY = y" • x’' (5) 

(x“)" = (xY"°" (6) 

([xi,...,x„]T = [^I,---,<r (7) 

1 • a; = a; = a; • 1 (8) 

(1)“ = (1)“" = 1 (9) 

r = 1 (10) 

[ir = i (11) 

[l,a;i, . . . = [xi, . . . ,a:„]’' (12) 



A term built from letters in the set A using the regular operations, the constant 
1 and the reverse operation ’’ are extended terms. Each extended term t on 
A denotes a word |t|. 

We use Theorems 2 and 3 to derive the following generalizations. 
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Theorem 4. Let s,t be two extended terms on A. Then 
|s| = |t| Ax U Rev \- s = t. 



Theorem 5. There is an 0{n^)~ algorithm to decide if an equation s = t between 
extended terms on A is valid, where n is the total number of symbols in the terms 
s, t. 



In the full version we will also prove the impossibility of finite axiomatiza- 
tions. 
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1-Bounded TWA Cannot Be Determinized 
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Warsaw University 



Abstract. Tree- walking automata are a natural sequential model for 
recognizing tree languages which is closely connected to XML. Two long 
standing conjectures say TWA cannot recognize all regular languages 
and that deterministic TWA are weaker than nondeterministic TWA. 
We consider a weaker model, 1-TWA, of tree walking automata that 
can make only one pass through a tree. We show that deterministic 1- 
TWA are weaker than 1-TWA. We also show a very simple language not 
recognized by 1-TWA. 



1 Introduction 

Tree-walking Automata (TWA for short) are a natural sequential model for 
recognizing tree languages. Originally introduced by Aho and Ullman in 1971 
[1], they have of late undergone something of a revival in a research program 
initiated by Engelfriet et al. [4,5]. TWA have also been gathering interest thanks 
to the advent of XML (see [7,2], or the survey [8]). 

A TWA is similar to a finite word automaton. In any given moment it is 
positioned in a single vertex of the tree assuming one of a finite number of 
states. Based on this state and the label together with some other information 
about the current vertex, the automaton can change its state and move to some 
neighboring vertex. 

The exact nature of this other information varies from definition to definition. 
Kamimura and Slutzki [6] showed that if this does not include the child number 
of the current vertex, it cannot even search the tree in a systematic way, such 
as doing a depth-first search. The now standard model assumes a TWA knows 
whether the current vertex is the root, a leaf, a left son or a right son. 

Albeit blessed with a long history, very little is known about TWA. One 
can easily prove that TWA-recognizable tree languages are regular, i. e. can be 
recognized by any one of several equivalent models of (branching) tree automata. 
However, many of the most fundamental questions, as posed in [8], still remain 
open: 

1. Do TWA capture the regular tree languages? 

2. Are deterministic TWA as expressive as TWA? 

3. Are TWA closed under complementation? 

* This research project was partly supported by Komitet Badan Naukowych, grant 
4 TllC 042 25 for the years 2003-2006, and the European Community Research 
Training Network “Games and Automata for Synthesis and Validation” 



P.K. Pandya and J. Radhakrishnan (Eds.): FSTTCS 2003, LNCS 2914, pp. 62—73, 2003. 
© Springer- Verlag Berlin Heidelberg 2003 
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The only progress made to date regarding these questions was by Neven and 
Schwentick, who in [9] presented a regular language that cannot be recognized 
by any one bounded TWA (1-TWA for short), i. e. one that traverses every edge 
at most once in each direction. 

Our main contribution is a proof that one bounded deterministic TWA (1- 
DTWA) do not recognize the same languages as 1-TWA. The language we use 
is the set Li of {a, b, c}-labeled trees in which there exists an occurrence of the 
letter b below some occurrence of the letter a (or, equivalently, denoted by the 
expression /a/b in XPath). Intuitively, we show that a 1-DTWA cannot keep 
track of whether or not it is below some a vertex. 

We are able to show this directly, using monoid combinatorics, only if A is a 
depth-first search (DFS) automaton. A DFS automaton is a 1-DTWA that visits 
vertices according to the lexicographical ordering. Such an automaton can, for 
instance, recognize the set of {A, V, 0, l}-labeled trees that correspond to well- 
formed boolean expressions evaluating to 1. 

After showing that no DFS automaton can recognize L\, we prove that DFS- 
like behavior can be forced in an arbitrary 1-DTWA. Thus we obtain: 

Theorem 1 

No deterministic 1-TWA recognizes Li, but there is a nondeterministic 1-TWA 
recognizing Li. 

Next we consider the language L 2 of {a, b, c}-labeled trees where below some 
occurrence of a there exist two occurrences of b which are not on the same 
branch. Using similar monoid combinatorics as in the case of Li, we show that 
L 2 cannot be recognized by any 1-TWA. This is similar to the result of Neven 
and Schwentick mentioned above. 

We think, however, that our example has two relative merits: first, the proof 
is quite straightforward, and second, our language L2 has very simple logical 
properties - for instance it can be defined using a CTL [3] formula. Thus proving 
our conjecture that L2 is not recognized by arbitrary tree walking automata 
would show how severely limited TWA are. 

Our language Li on the one hand, and the aforementioned language of Neven 
and Schwentick as well as our L2 on the other, show that the following inequal- 
ities hold (where REG is the class of regular languages): 

1-DTWA C 1-TWA C REG 

2 Tree Walking Automata 

In the paper we will be dealing with finite, binary, labeled trees. Let A be a 
finite prefix-closed subset of {0,1}* such that u • 0 G A v ■ 1 & A and let 

S be some finite set. A S-tree is any function t : A — >■ A. The domain dom(t) 
of the tree t is the set A. A vertex oft is any element v G dom(t). For vertices 
we use the prefix ordering: v < w if w = v ■ v' for some v' G A* . We denote the 
set of A-labeled trees by trees(A). Given t,t' G trees(A) and cr G A, let (t, a,t') 
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denote the tree that has a in the root, whose left subtree is t and whose right 
subtree is t' . 

A tree with a hole t[] is a tree t with some distinguished leaf v € dom(t). 
Given another tree s, the tree t[s] obtained from t by substituting the tree s for 
the leaf v. We distinguish the tree with a hole [], which has the hole in the root. 
Similarly we define a tree with two holes t[,]; we assume the first hole is to the 
left of the second one. 

A tree walking automaton is a tuple {Q, S,S,qj, F), where Q is the set of 
states, S is the alphabet, 5 is the transition function, qi € Q the initial state and 
F Q Q the set of accepting states. The function S is of the form: 

^ ■ {iitoitij-L} X Q X E ^ P{Q X {tiiojii}) 

We will describe the run of A over some A-labeled tree. The automaton starts 
in the root of the tree assuming the initial state. It then walks around the tree, 
choosing nondeterministically a direction to move based on the current state, 
the label in the current vertex and information on what it did in the previous 
move. This information can assume several values: f if it entered the current 
vertex from its parent, to if it entered from a left son, fi if it entered from the 
right son and T if there was an error in the previous move. 

Based on this information, A decides whether it wants to move to the ancestor 
by choosing f; or down into the left or right son by choosing to or ti respectively. 
The previously mentioned error can occur if the automaton tried to move up in 
the root, or tried to move down in a leaf. In these cases, the error T is reported 
and the automaton does not move. By convention, we assume that in the initial 
configuration, A remembers that in the previous move an error occurred. A run 
is accepting if A enters an accepting state. A tree t is accepted by A if there is 
an accepting run over t. 

A deterministic tree walking automaton, DTWA for short, is defined like a 
nondeterministic TWA, except that 6 returns only one result: 

(5 : U,to,ti,-L} X Q X E ^ Q X {t4o4i} 

We say a TWA (resp. DTWA) is 1-hounded if it traverses every edge at most once 
in each direction in every possible run. Let 1-TWA be the class of 1-bounded 
nondeterministic TWA and let 1-DTWA be the class of 1-bounded DTWA. 

Those familiar with TWA will notice that our definition is slightly nonstan- 
dard. Normally, a TWA decides what move to make based not on the type 
{i) to? to? -L} of tho previous move, but on information whether the vertex is: 
the root, a left son, a right son or a leaf. It is an easy exercise to show that these 
formalisms are equivalent. However, we choose the history-based approach so 
that we can talk about runs on a subtree s of a larger tree t without specifying 
whether s is rooted as a left or right son in t. 

2.1 Finite Monoid Equations 

Let Var = {x, y,z, . . be an infinite set of variables. Given a finite set E 
disjoint with Var, a E-equation is an expression of the form w = w' , where 
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w,w' G (i7 U Var)*. A function i/ : Var — >• S* is called a valuation. Let v* : 
(Var U A)* — >■ E* be the function substituting the iy{x) for every variable x. 
Given a homomorphism h : S* — >■ M from the free monoid S* into some finite 
monoid M, we say v satisfies {h,w = w') if h{i'*{w)) = h{v*{w')). 

A set E of A-equations is finitely solvable if for every finite monoid M and 
every homomorphism h : E* ^ M, there is a valuation jz such that jz satisfies 
{h, e) for all e G E. A constrained set of equations is a set of equations augmented 
with constraints on the valuation i/. In this paper we consider constraints of the 
form a G v{x) or a ^ stating what letters a G E must or cannot appear in 
v{x). 

Example 1. For every alphabet E and any satisfiable set of constraints, the equa- 
tion X = XX is finitely solvable. This follows from the following Lemma: 

Lemma 1. In every finite monoid M there is some n G N such that for all 
elements a of M, . 

Proof. Since M is finite, for some j,k < \M\. But then o’” = 

for all m > j,l > 0. In particular □ 

Example 2. The following set of constrained {a, 0, l}-equations is finitely solv- 
able: 



x • 0 • j/o = a;' • 0 • 2/0 

x • 1 • 2/1 = x' • 1 . 2/1 

a G jz(x) a ^ v[x'^ 

Let h : {0, 1, a}* — >■ M be an arbitrary homomorphism and let n be the constant 
from Lemma 1 appropriate to the monoid M . Let a' = 1 • 0" and a = 1 • a • 0". 
Given a word <j, let ■ b be the word b' such that a ■ b' = 6. We define the 
valuation v as follows: 

v{x) = a' -a^ v{x') = a' v{yo) = 0"-i • a" v{yi) = • a" 

We must show that: 

h{a' • a" • 0” • a") = h{a' ■ 0” • a”) 
h{a' ■ a" • a") = h{a' ■ a”) 

These equations are true because, by Lemma 1, /i(a'-0") = h{a'), h{a-0"‘) = h{a), 
and /i(a”) = h{a^ ■ a"). 

3 Separating 1-DTWA from TWA 

In this and the following section we will define two languages Li and L 2 we 
conjecture to separate the DTWA from TWA and TWA from regular languages 
respectively. The first language, Li is the following set of {a, b, c}-trees: 

Li = {t : 3a;, y G dom(t). t{x) = a A t{y) = b A a; < 2 /} 
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We are only able to prove that L\ cannot be recognized by any 1 -DTWA. The 
proof is split into two parts. First we show that no deterministic automaton that 
does a depth-first search of the tree can recognize Li. Then we generalize this 
result to arbitrary 1 -DTWA. 



3.1 A DFS Automaton Cannot Recognize ii 

A DFS automaton is a 1 -DTWA that in every run visits v before w iff w is 
lexicographically before w. Given a DFS automaton A, we will prepare two 
trees, t € L\ and t' ^ Li such that A cannot distinguish between them. In these 
trees we distinguish several nodes: wi,W2,ws € dom(t) and w[,W2, w'^ € dom(t'). 
We then claim that they are corresponding in the sense that they have similar 
histories - for instance the state assumed by A the second time it visits W2 is 
the same as the state assumed in w'2 in the corresponding visit. Finally, we show 
that A cannot distinguish the paths that lead from W3 to the root in t and from 
w'3 to the root in t' respectively, thus showing that A either accepts both trees 
or rejects both trees. 

The trees t and t' will have essentially very little branching - in fact they have 
only one “real” branching point each, apart from which we use trees obtained 
via an encoding operation, which given a word a G {0,1, a}*, produces a tree 
with a hole a[] (see Fig. 1 for the definition of a[]). 



© 


0 


© 


/ \ 


/ \ 


/ \ 


a[] 0 


© a[] 


«D 0 


0a[] = (a[]>c,c) 


la[] = (c,c,a[]) 


ag[] = (a[], a, c' 



Fig. 1 . The operation assigning a[] to a 



Let q;,q;',/ 3 , /3',7i,72 G ( 0 , 1 , a}* be some words which we will specify later 
by solving certain equations. We only require that a contains the letter a and 
o', 7i do not. The trees t and t' are defined as follows (see Fig. 2 for a graphic 
representation) : 

G = 7 i[(b,c,q 7 [c])] t = a[(^[c],c,ti)] f = ^[(/T[c], c, G)] 

By assumption on a, a' and 71, we have t G L\ and t' ^ L\. We will show 
that for every DFS automaton A, we can find words a, a', f 3 , j 3 ' , 71, 72 such that 
A cannot distinguish between t and t' . 

A partial state transformation is any partial function from Q to Q. For every 
word a G { 0 , l,a}* we will define two partial state transformations and f^. 
The intuition behind this being that assigns to a state q the state that A 
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Fig. 2. The trees t and t' 



will end in if it starts in the hole of a[] in state q and moves toward the root 
continuing the depth-first search. If in this run the automaton does not behave 
like a DFS automaton - it wants to visit something twice, for instance - /^((?) 
is undefined. A similar intuition goes for except that we start in the root 

and end up in the hole this time. 

We will only define f^{q), leaving the reader to define /q , fl, f^, /q , anal- 
ogously. Let <?i, <73, 94 be states satisfying the following equations: 

(< 7 i 4 i) = ^(to,a, 9 ) (92,4-0) = ^( 4 -,c, 9 i) 

(93, t) = <4 (-L,c, 92) (94, t) = <4(ti,a,93) 

If such states cannot be found, this means that A does not behave like a DFS 
automaton and f^{q) is undefined, otherwise f^{q) = 94. We extend from 

letters to arbitrary words in {0, l,a}* in the obvious fashion: 

/I/3 = fl°fl fi/3 = 4 o /i 

Consider the run of A on the tree t. Let 91 be the state of A assumed when 
first arriving in the vertex W2 (see Fig. 2 ). Let 92 be the state assumed when 
coming back from the left son to Wi; 93 when first arriving at w^; and, finally, 
let 94 be the state upon coming back to the root upon reading the whole tree. 
Analogously we define the states q[, q'2, q'3, q'4, which describe the run on the tree 
t'. An easy analysis shows that if A is a DFS automaton then these states satisfy 
the following equations: 
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91 = /io-/3(9/) 

92 = /d.;3(9c(9l)) 

93 = 9(92) 

94 /*a-l-7i-l-72 (93) 

where 

— qi is the initial state of A 

— g is the state transformation induced by going down the branch 71 , into the 
letter b, back up one edge, and down the branch 72 

— gc is the state transformation induced by entering a c leaf and leaving it 

Obviously, if we can prove that the corresponding /l^, /f functions in the above 
equations are equal, then we will have proven that <74 = (74 and, consequently, 
that A does not distinguish between t and t' . Using similar techniques as in 
Example 2, we can show that the following set of {a, 0, l}-equations is finitely 
solvable: 



0 

II 


( 1 ) 


92 = /(]./3'(9 c (9' i )) 


( 2 ) 


93 = 9(92) 




94 = /Im.7i.1.72(93) 


(3) 



X ■ 0 • y = x' ■ 0 ■ y' (ei) 

0-9 = 0-9' (62) 

X ■ 1 ■ Zi ■ 1 ■ Z 2 = x' ■ 1 ■ Zi ■ 1 ■ Z 2 ( 63 ) 

a G iy{x) a ^ v{x') constraints 



Let Mq be the monoid whose elements are partial state transformations and the 
monoid operation is composition. Given an arbitrary monoid M , let M denote 
the monoid whose operation is defined: 



0'-Mb=b-M a 

The function h associating with each word a the functions /2, fa is ^ homomor- 
phism from {a, 0, 1}* into the monoid Mq x Mq. Let be a valuation satisfying 
(h,ei), (/i, 62 ) and (/i, 63 ), which exists by the finite solvability of the above 
equations. Then the equations (1), (2) and (3) will be satisfied if we set: 

a = u{x) a = v{x') P = v{y) P' = iy{y') 71 = iy{zi) 72 = v{z 2 ) 



3.2 From DFS Automata to Arbitrary 1-DTWA 

We fix a 1-DTWA A for this section. We will show that A cannot recognize Li 
by reducing the problem to DFS automata. The reduction is accomplished by 
replacing in t and t' every cr-labeled (for a G {a, c}) inner node with a tree So-[, ], 
which has the property that if A first visits the left son in the root of So-, then 
it will first visit the left son in both of the holes. 

Lemma 2. If L{ A) = L\ then A must visit every node in a tree outside L\. 

Proof. Assume A does not visit a node v in the tree s ^ L\. Let s' be some tree 
in L\. Since s[w := s'] G Li, the desired contradiction comes from the fact that 
A accepts s iff A accepts s[u := s']. □ 
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We say q is an entering state if in some run there is a vertex v such that 
when A first visits w, state q is assumed. A state is clean entering if one of the 
runs in question is over a tree outside Li. It is c-clean entering if additionally 
all vertices betweend v and the root are labeled by c. Let a G {a, c}. A tree t is 
(g,CT)-clean if either: 

— q is c-clean entering and (t, cr, t) ^ Li; 

— <7 is merely clean entering and (t, a, t) ^ L\ 




Fig. 3. The tree So- 



We are now going to define for a G {a, c} a tree Sa- that will allow us to 
simulate DFS automaton behavior in an arbitrary 1-DTWA. Let n = \Q\. We 
define the tree as follows (see Fig. 3): 

dom(scr) = {0* : i < 2(n! -I- n)} U {1* : t < 2(n! -I- n)} U 

U{0* ■ 1 : i < 2(n! -I- n)} U {1* • 0 : z < 2(n! -I- n)} 

, , _ (a if u = or V = for some i 

^ ( c otherwise 

Let So [, ] be a tree with two holes which is obtained from So by placing the 
first hole instead of the vertex 0 ^"' and placing the second hole instead of the 
vertex 

Lemma 3. Let q he a clean entering state, a G {a, c}, and let t,t' G 
trees({)a, b, c} be such that t{e) = t'{e) = c and t is {q,cr)-clean. If A first 
visits the left son in the root of So then in the run on So[t,t'], A first visits the 
left son in the vertices 0 ^"' and 

Proof. Let g be a clean entering state and assume that (5(1, c,q) = ((^Mo)? i- e. 
A first goes into the left subtree. Consider now the run p on So that starts in 
this state q. First note that, in accordance with Lemma 2, this run must visit 
every vertex of the tree. 

For i G {0, . . . ,n! -I- n}, let q^ be the state assumed in 0^* after going up 
from By a counting argument, for some j < n, ( 7 "'+^ = But then 
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g"' = q^'-i = . . . = gO. Since A must visit all the tree and the left subtree of the 
root e was visited first, <5(to; c) = <5(to, c) = {q' , |i) for some q' G Q. But 
this means that the left son of 0^"' was visited before the right son. By a similar 
reasoning, we show that the left son of 1^"' was visited before the right son. 

Finally, to finish the proof of the Lemma, we notice that the decision to first 
visit the left subtree in 0^"' is made before visiting the actual subtree. This means 
that the result stays the same no matter what tree we substitute under the vertex 
0^”', as long as this tree has c in the root. A similar argument can be made for 
P”', however this time we must use the assumption that the tree substituted 
under 0^"' was {q, (j)-clean, since otherwise we could not have assumed in the 
proof that A needs to visit the whole tree. □ 



Lemma 4. A does not recognize L\ . 

Proof. Without loss of generality, we assume that in the root and initial state 
qi, A first visits the left subtree. For every tree s G trees{a, b, c} that has inner 
nodes labeled only by a, c, we define a tree s G trees({a, b, c}) by induction. 
If s = (so,cr, si), then s = So-[sO)Si]; otherwise dom(s) = {e} and then s = s. 
Given v G dom(s), let v G dom(s) be the vertex whose subtree is s|^. We will 
prove that A does not distinguish between t and t' - where the trees t and t' are 
the ones discussed in the previous section. 

For s G we call a vertex v s-main, if v = w for some w G dom(s). 

Since t' ^ Li, the assumptions of Lemma 3 apply and A behaves like a DFS 
automaton in its entire run over t' , i. e. first visits the left son before the right 
son in every F-main vertex. By a similar reasoning, in the run on t, A behaves 
like a DFS automaton before entering the right son of wi. By the results on 
DFS automata, the states assumed by A before entering the right son of Wi and 
before entering the right son of w'l are the same. Since the right subtrees of rci 
and w'l are the same, the vertices and Wg are reached in the same state. 

In order not to violate the 1-bounded condition, A must now go back to the 
root without visiting the left son of any main vertex. Note that here we use 
the fact that no run of a 1-bounded TWA can traverse an edge twice in the 
same direction. Thus, by the results on DFS automata, A does not distinguish 
between t and t'. □ 



Theorem 1. No deterministic 1-TWA recognizes L\, but there is a nondeter- 
ministic 1-TWA recognizing Li. 

Proof. This follows immediately from Lemma 4 and the fact Li can be easily 
recognized by a 1-TWA that guesses some position labeled by a and then some 
deeper position labeled by b. □ 

In fact, we conjecture: 

Conjecture 1. No deterministic TWA recognizes L\. Consequently, deterministic 
TWA recognize fewer languages than their nondeterministic counterparts. 
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4 Separating 1-TWA from REG 

Consider the regular language L 2 C trees({a, b, c}) consisting of trees where 
there exist two incomparable occurrences of b below some occurrence of a. 

Conjecture 2. No TWA recognizes 

From a logical point of view, this language is rather simple, as testified by: 

— In XPath, L 2 is the set of trees satisfying the query /a[/[*[l]//b][*[2]/b]]; 

— An obvious TWA that uses conjunction can recognize L 2 ; 

~ In CTL over ordered trees, L 2 is the set of trees satisfying (0 and I stand 
for child number tests) EF[a A EX(0 A EFb) A EX(1 A EFb)] 

Consequently, proving Conjecture 2 would show that TWA do not subsume 
the above formalisms. Unfortunately, we can only prove that no 1-TWA can 
recognize L^'. 
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Fig. 4. The trees t and t' 



Lemma 5. No 1-TWA recognizes ^ 2 - 

Proof (sketch). Let A be some 1-TWA. We take some words a,a' , Po, Pi G 
{a, 0,1}* such that a € a, a ^ a' and define the trees t,t' as in Fig. 4. By 
assumption on a and a' , we have t € L 2 and t ^ L 2 - We will prove that - given 
a suitable choice of a, a', Po, Pi - if A accepts t then A accepts t'. 

Since we are now dealing with nondeterministic automata, the state trans- 
formations will be replaced by arbitrary relations, which associate with a state 
all the possible states reachable from it via a particular tree. Let Nq be the 
monoid whose elements are binary relations over Q and whose operation is the 
standard composition of relations: 

{q, qpeRoS ^ 3q".{q, q") G i? A {q" , q') G 5 
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With each word 7 G {a, 0, 1}* we associate two relations and Intuitively, 
{q, q') G if there is a run that starts in q in the hole of 7[] and ends up in q' 
in the root and a similar definition holds for . Since the function associating 
these relations with a word is a homomorphism into the monoid Nq x Nq, we 
can finish the proof by solving the following constrained equations: 

x • 0 • j/o = a;' • 0 • 7/0 
x-l-yi = x' - l-yi 
a G v{x) a ^ v{x') 

These are, of course, the equations from Example 2 , and we know that they are 
finitely solvable. This means that there are words a,a' such that: 

(^) -^I- 0 -/ 3 o “ ^a’-O-yo ~ 

(C) Rl,.y,=Rl,.,.y, Ri.,.y,=Ri.,.p, (D) 

We use these words in the trees t and t' . Consider an accepting run of A on 
the tree t. Obviously, it must visit both W2 and IV3. Assume, without loss of 
generality, that W2 is reached before W3, in state q. Since in this run the right 
son of wi was not visited, there is also a run over the tree a • 0 • / 3 i [b] that 
assumes state q in the b node. By equation (B), there is a run over the tree 
a' • 0 • /?i[b] that also assumes state q in the b node. 

Consider first the case when, in this second run, the right son of the vertex 
corresponding to w[ is visited. However, in the run over t, starting in state q 
and vertex W2, the automaton goes up and then into the right son of wi. This 
first case would then give us a run over a' ■ 0 ■ ( 3 i [b] that violates the 1-bounded 
condition. Here we use the fact that no run of a 1 -bounded TWA can traverse 
an edge twice in the same direction. Thus the second case must hold, i. e., in 
the run over a' ■ 0 ■ f 3 i [b] the right son of the vertex corresponding to w[ is not 
visited. But this shows that A can enter w'2 in state q. 

In a similar manner we can copy the rest of the accepting run onto the tree 
t', which shows that A cannot accept L2. □ 

5 Concluding Remarks and Further Work 

In this paper we have proved that 1 -DTWA recognize fewer languages than 
1 -TWA, which in turn do not recognize all the regular languages. The result 
about 1 -DTWA is new and may be a first step toward proving the long standing 
conjecture that DTWA do not recognize all languages recognized by TWA. The 
second result is an improvement on a language of Neven and Schwentick, which 
also separates 1 -TWA from the regular languages. 

The proofs of both our results require solving certain kinds of monoid equa- 
tions. Apart from giving answers to the conjectures 1 and 2, a good topic for 
further work is the question: 
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Is the finite solvability of (constrained) monoid equations decidable? 

Apart from an independent interest, an answer to this question might be rele- 
vant to further work on the expressive power of TWA. It seems plausible that 
an attempt to continue the line of attack presented here might require solving 
numerous and more complicated equations. An automatic procedure might then 
come in handy; moreover techniques used in such a procedure might shed some 
insight on the TWA. 
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Abstract. Process Rewrite Systems (PRS for short) subsume many 
common (infinite-state) models such as pushdown systems and Petri 
nets. They can be adopted as formal models of parallel programs (mul- 
tithreaded programs) with procedure calls. We develop automata tech- 
niques allowing to build hnite representations of the forward/backward 
sets of reachable configurations of PRSs modulo various term structural 
equivalences (corresponding to properties of the operators of sequen- 
tial composition and parallel composition). We show that, in several 
cases, these reachability sets can be represented by polynomial size finite 
bottom-up tree-automata. When associativity and commutativity of the 
parallel composition is taken into account, nonregular representations 
based on (a decidable class of) counter tree automata are sometimes 
needed. 



1 Introduction 

Automatic verification of software systems is nowadays one of the most chal- 
lenging research problems in computer-aided verification. A major difficulty to 
face when considering this problem comes from the fact that, reasoning about 
software systems requires in general to deal with infinite-state models. Indeed, 
regardless from the fact that they may manipulate data ranging over infinite do- 
mains, programs can have unbounded control structures due to, e.g., recursive 
procedure calls and multi-threading (dynamic creation of concurrent processes). 

We consider in this paper as formal models of programs term rewrite systems 
called PRS (for process rewrite systems) [May98], in the spirit of the approach 
advocated in [EK99,EP00], and we develop automata-based techniques for per- 
forming reachability analysis of these systems. A PRS is a finite set of rules of 
the form t — >■ t' where t and t' are terms built up from the idle process (“0”), a fi- 
nite set of process variables (X), sequential composition (“•”), and asynchronous 
parallel composition (“||”). 

The semantics of PRSs (see [May98]) considers terms modulo a structural 
equivalence ~ which expresses the fact that 0 is a neutral element of and 
“II”, that is associative, and that “||” is associative and commutative. With 

* This work has been supported in part by the European IST-FET project ADVANCE 
(contract No. IST-1999-29082). 
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this semantics, PRSs subsume well-known models such as pushdown systems 
(equivalent to prefix rewrite systems) and Petri nets. Their relevance in the 
modeling of programs (given as flow graph systems) is shown in works such as 
[EK99,EP00,Esp02], 

The problem we consider in this paper is computing representations of the 
Post* and Pre* images of sets of process terms, for given PRS systems. Since 
process terms can be seen as trees, we consider representations of sets of terms 
based on (bottom-up) tree automata. We use in particular finite tree automata 
to represent regular sets of terms. However, due to the associativity of and 
the associativity-commutativity of “||”, Post* and Pre* images of regular sets 
are not regular in general [GD89]. Therefore, we adopt an approach inspired 
from [LS98] which is based on (1) considering stronger structural equivalences 
obtained by ignoring some of the properties of the operators and “||”, and 
(2) computing representatives of the reachability sets modulo the considered 
structural equivalence, that is, a set which contains at least one representative of 
each equivalence class, instead of computing the whole reachability set. The idea 
is that in many cases (1) computing reachability sets for stronger equivalences 
is “easier” (e.g., regular and effectively constructible), and (2) the reachability 
analysis modulo ^ can be shown to be reducible to computing representatives 
of the reachability set modulo some stronger equivalence. 

Therefore, our aim in this work is to explore the problem of constructing 
reachability sets of PRSs modulo the following equivalences: (1) term equality 
(=), (2) the relation ~o which takes into account the neutrality of 0 w.r.t. 
and “II”, (3) the relation which, in addition, considers the associativity of 
and (4) the equivalence ~. Given one of these equivalences =, we denote by 
Post^ and Pre^ the forward and backward reachability relations modulo =. 

In a first step, we consider the reachability problem modulo term equality. 
We show that for every PRS system, regular tree languages are effectively closed 
under Postf and Pref images, and we give polynomial-time constructions of 
the corresponding tree automata. Then, we show that these constructions can be 
adapted to the case of the equivalence ~o • Our results generalize those of [LS98] 
concerning the class of PA systems, i.e., the subclass of PRS where all left-hand- 
sides of rules are process variables. (It is well known that PA is uncomparable 
with pushdown systems and Petri nets since it combines context-free processes, 
equivalent to pushdown systems with one control state, with the so-called basic 
parallel processes which are equivalent to synchronization-free Petri nets.) 

Then, we consider the structural equivalences and As said above, these 
equivalences do not preserve regularity, and therefore, we address the problem of 
constructing ^-representatives of the PosfA and PreA images for these equiva- 
lences. 

In the case of the equivalence we prove that regular ^^-representatives 
of the PRS Post* and Pre* images of regular tree languages are effectively 
constructible, and we give polynomial-time constructions of corresponding finite 
tree automata. Of course, the constructed reachability sets are in general under- 
approximations of the reachability sets modulo However, for a significant 
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class of parallel programs, the consideration of ~|| is not necessary (their PRS 
model modulo coincides with the one modulo ^), and therefore they can be 
analyzed using our constructions. 

In the case of the equivalence we restrict ourselves to the class of PAD 
systems, i.e., the subclass of PRS where || does not appear in left-hand-sides 
of rules. This subclass subsumes both pushdown and PA systems (it combines 
prefix rewrite systems and synchronization-free Petri nets). The interest of this 
class is that it allows to model parallel programs where procedures can have 
return values. (Taking into account return values is possible with pushdown 
systems for instance, but not with PA processes.) 

First, we show that a regular ^-representative of the PAD Post*^-iTad%e of 
any regular tree language is polynomially constructible. As for computing the 
-images, the problem is more delicate due to the parallel operators which 
appear in the right-hand-sides of the rules. In the case where the initial lan- 
guage is closed under ~||, we show that the same result as for -images 

holds. When the initial language can be any regular set, we are not able to 
construct a regular ^-representative of the Pre)l,-images; nevertheless, we can 
construct a nonregular ^-representative using counter tree automata. Fortu- 
nately, the counter automata obtained by our construction have the property 
that the emptiness problem of their intersection with regular sets is decidable. 
This makes our construction useful for reachability analysis-based verification. 

For lack of space, some of the constructions and all proofs are only sketched. 
Details can be found in the full paper. 



Related work: In [May98], Mayr considers the term reachability problem, i.e., 
given two PRS terms t and t', determine whether t' is reachable from t. He 
proved that this problem is decidable by a reduction to the reachability problem 
in Petri nets. The problem we consider here is different since we are interested 
in computing (a symbolic represenation of) the set of all reachable configura- 
tions, or a representative of it modulo a structural equivalence, starting from a 
(potentially infinite) set of configurations. 

Symbolic reachability analysis based on (word) automata techniques has been 
used for model-checking pushdown systems in, e.g., [BEM97,FWW97,EHRS00]. 
In [LS98] , this approach is extended to PA processes using tree automata as sym- 
bolic representation structures. The application of symbolic reachability analysis 
of PRSs to program (data flow) analysis is proposed and developed in the case 
of pushdown and PA systems in [EK99,EP00]. 

Our results in this paper generalize those given in [LS98,EP00] for PA. These 
works consider the construction of the reachability sets modulo term equality 
(without structural equivalences). In our work, we address this problem also 
modulo structural equivalences. Moreover, we show that the approach intro- 
duced in these papers can be extended to the more general classes of PAD and 
PRS. This allows to reason about larger classes of programs (where procedures 
can have return values, and where parallel processes can have some kind of 
communication) . 
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In [BET03] we consider the analysis of programs modeled by means of com- 
municating pushdown systems. This problem is in general undecidable, and then 
we propose an algorithmic approach based on computing (either finite or com- 
mutative) abstractions of pushdown languages. The framework we consider here 
is different since (1) we do not consider in [BET03] dynamic creation of pro- 
cesses, and (2) we provide in [BET03] approximate analysis techniques for a 
precise model whereas we provide here precise analysis techniques for a “weaker” 
(more abstract) model (PRSs are weaker in the sense that they allow threads to 
synchronize only when their stacks are empty, which is the reason behind the 
decidability of their reachability problem). 

Finally, it is well known that ground term rewriting (GTR) systems pre- 
serve regularity [DT90]. However, even if PRSs are syntactically sets of ground 
rewriting rules, they are not standard GTR systems due to the semantics of the 
operator which imposes a particular rewriting strategy. Hence, even in the 
case where the considered structural equivalence is term equality, our results are 
not covered by [DT90]. 



2 Preliminaries 

2.1 Terms and Tree Automata 

An alphabet S is ranked if it is endowed with a mapping rank : if — >■ N. For 
fc > 0, iffc is the set of elements of rank k. Let A be a fixed denumerable set of 
variables {x\,X 2 , . . . }. The set Ts[X] of terms over S and X is the smallest set 
that satisfies: SoUX C Ts[X], and if fc > 1, / G and ti, . . . ,tfc G Ts[X], then 
/(ti, . . . ,tk) is in Ts[X], Ts stands for Ti;[0]. Terms in Ts are called ground 
terms. A term in Ti;[A] is linear if each variable occurs at most once. A context 
C is a linear term of Ts[X]. Let ti, . . . ,t„ be terms of Ty;, then C[ti , ... ,t„] 
denotes the term obtained by replacing in the context C the occurrence of the 
variable Xi by the term ti, for each 1 < t < n. 

Definition 1 ([CDG+97]). A tree automaton is a tuple A = {Q, S, F, i5) 

where Q is a set of states, E is a ranked alphabet, F C Q is a set of final states, 
and S is a set of rules of the form (1) f{q \, . . . , g„) — f 9, or (2) a ^ q, or (3) 
q — >■ q' , where a G Eq, n> 0, / G E^, and qi, ■ ■ ■ , <7„, q, q' € Q. If Q is finite, A 
is called a finite tree automaton. 

Let -^s be the move relation of A defined as follows: Given t and t' two 
terms of Tuuq, then t -^s t' iff there exist a context C G Ts\jQ[dd], and (1) 
n ground terms t\, . . . ,tn G T^, and a rule f{qi , . . . , qn) q in S, such that 
t = C[f{qi , . . . , qn)], and t' = C[q\, or (2) a rule a ^ qin 5, such that t = C[a], 
and t' = C[q], or (3) a rule q ^ q' in 6, such that t = C[q], and t' = C[q']. Let 
be the reflexive-transitive closure of — >-5. A term t is accepted by a state q G Q 
iff t -^s q- Let Lq be the set of terms accepted by q. The language accepted by 
the automaton A is C{A) = {}{Lq \ q G E}. 
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A tree language is regular if it is accepted by a finite tree automaton. The 
class of regular tree languages is effectively closed under union, intersection, and 
complementation. The emptiness problem of finite tree automata can be solved 
in linear time. 

2.2 Counter Tree Automata 

We define hereafter counter tree automata and introduce a subclass of these au- 
tomata, called 0-test counter tree automata we use later in reachability analysis 
of process rewrite systems. 

Definition 2. A counter tree automaton (CTA for short) is a tuple A = 
{Q, U, F,c,6) where Q is a finite set of eontrol states, S is a ranked alphabet, 
F C Q is a set of final states, c = (ci, . . . ,Cm) is a vector of integer counters, 

and 6 is a set of rules of the form (A) f{qi , . . . , qn) q, or (B) a q, 

or (C) q where a £ Fq, n > 0, f e A„, qi,... ,qn,q,q' & Q, A^(c) 

is a formula in Presburger arithmetics depending on c, and k G Z*”. (The rule 

f{qi, ■ ■ ■ ,qn) q, where p, = true and k = O’”, will be denoted simply by 

f{qi,--- ,Qn) q-) 

Intuitively, a CTA is a tree automaton supplied with a set of counters: The 
rules (A), (B), and (C) behave respectively like the rules (1), (2), and (3) of 

Definition 1. A rule /(< 7 i, ... ,qk) — — — ^ q can be applied if the values of the 
counters satisfy p{c), and in this case, the vector of counters c is incremented 

by k. 

Formally, we define the language of the CTA A = {Q, S, F, c, 5) as the set of 
ground terms accepted by an associated infinite tree automaton Ac we introduce 
hereafter. Let us first consider the following notation: Given a Presburger formula 
/x(c) and v a valuation of c, we write v |= /i iff /i(v) is true. Then, the automaton 
Ac is given by {Qc, F, Fc, Sc) where Qc = Q 'x Z’”, Fc = F x Z*”, and Sc is the 
smallest set of rules s.t.: 

- /((gi,vi),... ,(g„,v„)) -)> (g,v) G Sc if /(<?!,.. . is in <5, 

Er=i Vi h and V = k -F vi, 

- a — >■ (g, v) G dc if a q is in S, and v = k, 

- (g, v) — >• (g', v') G (5c if g q' is in (5, v ^ p, and v' = v -|- k. 

Obviously, the emptiness problem is undecidable for CTAs. We introduce 
hereafter a class of CTAs for which we can prove useful closure and decision 
properties. 

Definition 3. A 0-test counter tree automaton (0-CTA for short) is a CTA 
whose rules are such that p{c) is either true, or is equal to the test Ar=i Ci = 0. 

Theorem 1. The intersection of a regular tree language and a 0-CTA language 
is a 0-CTA language. Moreover, the emptiness problem of 0-CTAs is decidable 
in nondeterministic polynomial time. 
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3 Process Rewrite Systems 

3.1 Definition 

Let Var = {X, F, . . . } be a set of process variables, and Tp be the set of process 
terms t defined by the following syntax, where X is an arbitrary constant from 
Var: 

t ::= 0 \ X \ t ■ t \ t\\t 

Intuitively, 0 is the null (idle) process and (resp. “||”) denotes sequential 
composition (resp. parallel composition). We use both prefix and infix notations 
to represent terms. 

Definition 4 ([May98]). A Process Rewrite System (PRS for short) is a finite 
set of rules of the form t\ — >■ t 2 , where t\,t 2 G Tp. A PAD is a PRS where all 
left-hand- sides of their rules do not contain “|| ”. A PA is a PAD where all the 
rules have the form X ^ t. 

A PRS R induces a transition relation — over Tp defined by the following rules: 

tl t 2 (z R ^ tl t'l tl t'l ^ t 2 —^R t '2 tl ~0 0 , t 2 —^R t '2 

tl -^R t2 ’ tl I|l2 —>R t\ ||t2 ' tl ■ t2 ~^R t\ • t2 ’ tl 11^2 — tl ||t2 ' tl ■ t2 R tl ■ tj 

where ~o is equivalence between process terms that identifies the terminated 
processes. It expresses the neutrality of the null process “0” w.r.t. “||”, and 

Al: t-0 ~o 0-t ~o t\\0 ~o 0||t ~o t 

We consider the structural equivalence ~ generated by Al and: 

A2: (t • t') ■ t" ^ t ■ ft' ■ t") : associativity of 

A3: t\\t' ^ t'\\t : commutativity of “||”, 

A4: (t||t')||t" ^ t||(t'||t") : associativity of ‘II”. 

We denote by the equivalence induced by the axioms Al and A2, and by 
~ll the equivalence induced by the axioms Al, A3, and A4. Each equivalence = 
induces a transition relation z^=,r defined as follows: 

Vt, t' G Tp, t =^=,K t' iff 3m, u' G Tp such that t = u,u m', and u' = t' 

Let be the reflexive transitive closure of =^=.r. Let Post*n^{t) = {f G 

Tp I t 4>=,_r t'}, and Pref^^ft) = {f € Tp \ t' 4>=,_r t}. These definitions are 
extended to sets of terms in the standard way. We omit the subscript = when it 
corresponds to the identity (=). 
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3.2 Reachability Analysis Problem 

A PRS process term can be seen as a tree over the alphabet S = Sq\JE2, where 
So = {0}UVar and A2 = {., ||}. A set of terms is regular if it can be represented 
by a finite tree automaton. 

The =-reachability problem consists in, given two regular sets of terms 
Li and L2, deciding whether Post’^^(Li) fl L2 yf 0, or equivalently whether 
Pre^=(L2) D L I yf 0. Therefore, the basic problems we consider in this pa- 
per are to compute, given a regular set L of terms, representations of the sets 
Post*f^ _(L) and Pre*^ = {L). However, it can immediately be seen that, due to the 
associativity of and to the associativity-commutativity of “||”, these reacha- 
bility sets are not regular in the cases of the equivalences ~||, and The 
approach we follow in this case is to solve the =-reachability problem using only 
representatives of these reachability sets. Let us introduce this notion through 
some definitions: Let t be a process term. We denote by [t]= the equivalence 
class modulo = of t, i.e., [t]= = {t' £ Tp \ t = t'}. This definition is extended 
straightforwardly to sets of terms. We say that a set of terms L is =-compatible 
if [L]= = L. Then, a set of terms L' is a ^-representative of L if [L']= = L. 

Lemma 1. Let L\,L2 he two sets of terms, and let L'^ he a ^-representative of 
L\ ■ If L2 is =-compatible, then L[ D L2 ^ 9 iff Li D L2 ^ 9 - 

So, computing regular ^-representatives of the Post*j^ ^ and Pre*j^ ^ im- 
ages of regular sets allows to solve reachability problems. Actually, these =- 
representative sets do not need to be regular, but only to be in some class of 
languages for which the emptiness of the intersection with regular sets is decid- 
able. This is the case for instance for the class of 0-test counter tree automata. 

In the remainder of the paper, we assume w.l.o.g. that PRS are in normal 
form, that is, their rules are of the forms ti — >■ ^2 where ti and t2 are of the 
following forms: 0, X, A||F, or X -Y. Indeed, from the proof of a very close fact 
in [May98], it can be shown that for every PRS R over a set of process variables 
Var, it is possible to associate a PRS R' in normal form over a new set of 
process variables Var' (which extends Var by some auxiliary process variables), 
and there exists two (composable) regular relations Si and S2 such that, for 
every =, Post*j^ = = S'2 ° Post*p^, S\ and Pref^ = = 820 Pre’f^, -oSi. 

4 Reachability Modulo Term Equality and Modulo ~o 

4.1 Reachability Modulo Term Equality 

We prove in this section that for any regular language L, Post*j^{L) and Pre*j^{L) 
are effectively regular. Let Subr{R) be the set of all the subterms of the right 
hand sides of the rules of R. Let Qn = {qt \ t £ Subr{R)}, and let Sr be the 
following transition rules: 

~ A — >• <7x, for every X G Subr{R), 

~ \\{qx,qY) qx\\Y, if X\\Y £ Subr{R), 
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” ■{(}x,qY) -1 qx-Y, a X - Y G Subr{R). 



It is clear that with Sr, for every t G Subr{R), the state qt accepts {t} (i.e., 
= {t}). We define the automaton A*j^ = {Q', S, F', S') as follows: 

— Q' = {q, | <7 G <5 U Qr}- We denote by q any state in {q, q^ , q™^}. 

— F' = {q,q-^\q^ \q&F}, 

— S' is the smallest set of rules containing <5U(5i? s.t. for every gi, <72, q G Q^JQr- 



1. g — >■ G 5' for every g G <5 U Qr, 

2. if 0 4a q, then 0 ^ g™' G S', 

3. if ti — >■ ^2 G R, and there is a state 
g G Q U Qr such that t\ g^> 
then: 

a) ql ^ g^ G y, 

b) gt™' ^ g”*' G S', and 

c) qj^ -Y g”*' G S' if h = 0, 



4. if -(gi, g 2 ) — 1 g G i5 U Sr, then: 

a) •(gf' 42) -1 g"^ G <5', 

b) •(gf^<?™') -1 9™' G S', 

c) ■\q[,q2) -1 g"^ G S', 

5. if ||(gi,g 2) -1 g G 5U then: 

a) ||(gr',92*0 ^ 9™' G (5', 

b) 11(4) 92 ) — 1 9^ G S' , 

6. if g — >■ g' G S, then g^ — >• g'^ G S' , 
and g”*' ^ g'”*' G S'. 



The definition of S' is inductive and can be computed as the limit of a finite 
sequence of increasing sets of transitions S[ d S '2 d ... d S'^, where i5'+i con- 
tains at most three transitions more that S[. These transitions are added by the 
inference rules (3). This procedure terminates because there is a finite number 
of states in Q U Qr. 



Theorem 2. Let L be a regular set of process terms, and A = {Q, E, F,S) be 
a finite tree automaton that recognizes L. Then, PostffiL) and Pre*j^{L) are 
recognized by the finite tree automata A\ and respectively. 

Let us sketch the proof idea. To show that Post*j^{L) is accepted by A'fi, it 
suffices to show that for every g G Q^Qr, the state g^ accepts the successors of 
Lq (i.e., LqT = Post*f^{Lq)) and the state g”*^ accepts the successors u of Lq that 
have been obtained from null successors of Lq (i.e .j LqYiil = P0St*R{P0St*ji{Lq) n 
{m G Tp I t6 ~o 0}))- In particular this means that for every t G Subr{R), 
LqT = Postf^{t), and Lq^a = PosL)^(Postf^{t) fl {m G Tp | m ~o 0}). Rules (1) 
express that Lq d PostffiLq). Rules (2) mark the null leaves with the superscript 

indicating that they are null. Rules (3) express that if a term t in SubfiR) is 
a successor of a term in Lq, then so are all the successors of t. The “||” nodes are 
annotated by the rules (5). For example, the intuition formalized by the rules 
(5b) is that if ui G Post*j^{Lqfi), and U 2 G Post*i^{Lqfi), then ^1142 G PosL)^{Lq) 
if ||(gi, 92) — 1 g G 5 U Sr. The rules (4) annotate the nodes according to the 
semantics of The states g and g”** play an important role for these rules. 
Indeed, the rules (4a) and (4c) ensure that the right child of a node cannot 
be rewritten if the left child was not null, i.e., if the left child is not labeled by a 
state g”®b The rules (6) express that if Lq d Lq>, then Post'fi{Lq) d Postf^{Lq>). 
Finally, the fact that PosLfi^(L) is accepted by Ar implies that Al^_i accepts 
Pre*j^{L) since Pre*j^{L) = Post*^_^{L). 

Remark 1. li A has k states and r transitions, then A^fi has 0{k\Subr{R)\ + 
ISubr-iR)]"^ + t) transitions and 3(fc -I- \Subr{R)\) states. 
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4.2 Reachability Modulo 

The construction above can be adapted to perform reachability analysis modulo 
~o- 

Theorem 3. Let R be a PRS and L he a regular tree language, then a finite 
tree automaton can be effectively computed in polynomial time such that 

Post*^^^^{L) is recognized by and Pre’^ is recognized by A'^d[. 

5 Reachability Modulo 

Computing post*- and pre*-images modulo ~s does not preserve regularity 
[GD89]. Therefore, we propose in this section to compute ~s-representatives 
of the reachability sets Post*j^^ (L) and Pre*^^^ (L). We show hereafter that 
for any regular language L, we can effectively compute finite tree automata that 
recognize ^^-representatives of Post*^^ ^ (L) and Pre*j^ ^ (L) . 

The main difficulty comes from the fact that the rewritings are not done 
locally as in the previous case. Indeed, so far, a rule X ■ Y t is applied to a 
term u only if u has X -Y as an explicit subterm. This is no longer the case when 
we consider terms modulo Indeed, this rule should be applied for instance 
to the terms X ■ (Y ■ {Z ■ T)) and X ■ ((T ■ Z) -T) since they are ~s-equivalent to 
{X ■ Y) ■ {Z ■ T). To be more precise, let us introduce the notion of seq-context: 

Definition 5. Let x € A, a seq-context is a single-variable context C[x] such 
that: (1) X is the leftmost leaf of C, and (2) all the ancestors of the variable x 
are labeled by 

Then, modulo a rule X ■ Y ^ t can be applied to any term of the form 
■{X, C\Y]) for a seq-context C, to yield a term that is ^-equivalent to C\t\. We 
define now the relation that performs this transformation for an arbitrary 
PRS as the smallest transition relation over Tp that contains (since 

includes ~o) and such that for every rule X - Y t in R, and every seq-context 
C, ( • {X,C\Y\), C[t\) G Cfl- It can be shown that Cf^{L) is a ^^-representative 
of Post*j^ \l). 

Proposition 1. For every PRS R and regular language L, Post*p^ ^ (L) = 
[Cr{L)U o.nd (L) = [C* . 

Then, our next objective is to compute Cr{L) for a regular language L. 

Theorem 4. For every PRS R and every regular language L, the set Cr{L) 
is effectively regular. Furthermore, from an automaton with k states and r 
transitions that recognizes L, it is possible to construct in polynomial time an 
automaton accepting C^(L), with 0{fk + \Subr{R)\)'^\Var\'^ -\- r) transitions 
0{k\Var\ -\- \Subr{R)\\Var\) states. 

We give hereafter the idea behind the construction. When annotating a subterm 
of the form C[t] where C is a seq-context and t is the right-hand-side of a rule 
X -Y t in R, the automaton guesses when it reaches the root of the subterm 
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t that at this place there was a Y which was rewritten, and memorizes the 
X in his state (moving to a state of the form {q,X)) in order to validate his 
guess afterwards when reaching the root of the seq-context C. At that point, the 
automaton checks that the term C[t] he annotated so far is indeed a successor 
of a term of the form ■{X,C\Y\) accepted at the current state. In such a case, 
the automaton considers that C[t] should also be accepted at the same state. 
(For that, we add, for every transition rule ■{qi,q 2 ) —1 9 in the automaton, a 
new transition rule {q 2 ,X) q, if X G L^t.) The crucial point is that it can 
be shown that at each node the automaton has to memorize at most one guess. 
This is due to the semantics of the operator which ensures that at one node, 
at most two nonlocal rewritings can occur. 

6 Reachability Modulo ~ for PAD Systems 

We consider in this section the problem of reachability analysis (modulo ~) of 
the class of PAD processes (see Definition 4). We show that forward analysis 
can be done in polynomial time by computing regular representatives of Post‘d 
images. Backward analysis can also be done similarly, but only if we start from 
a ~ II -compatible set. In the general case, we show that backward reachability 
analysis can still be done, but this time by computing nonregular representatives 
of Pre’^ images using 0-test counter tree automata. 

6.1 Preliminaries 

Definition 6. A paral-context is a context C[xi, . . . ,x„] such that all the an- 
cestors of the variables xi , ... are labeled by 

The main difficulty comes from the rules X\\Y — >■ t that can be applied non- 
locally. More precisely, modulo ~, this rule can be applied to any term of the 
form C[X,Y] for a paral-context C[xl^X 2 \^ to yield a term that is ^-equivalent 
to C[0, t\. For technical reasons, we introduce a special new process constant “0” 
that is considered as ^-equivalent to 0, and consider the term Cpjt] (which is 
^-equivalent to C[0,t]). The difference between “0” and “0” is that “0” is never 
simplified (modulo ~o) nor rewritten. Technically, it has the role of a marker 
which allows to determine the positions where ||-rules have been applied. Then, 
we have to deal with the presence of “0” in the terms. 

Definition 7. A null-context is a single-variable context C[x] such that all the 
leaves other than the variable x are labeled by “0”. 

In other words, if Cq is a null-context, then Co[A] is ~-equi valent to X. There- 
fore, if C is a paral-context, and Cq, Cg are null-contexts, then the rule X\\Y — >■ t 
has to be applied to any term of the form C'[C'g[A], Co[y]] , to yield a term 
^-equivalent to C'[C'g[0], CQ[t]] . In the same manner, if Cg is a seq-context, 
then a rule of the form X ■ Y ^ t has to be applied to any term of the form 
•(Co[A], and rewrite it into a term ^-equivalent to •(C'g[0], Cs[t]). Ob- 

serve that on the right child Os[F], it is not possible to have O’s. This is due to 
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the prefix rewriting policy of and to the fact that O’s are absent from the 
initial language. We introduce a relation which corresponds to these trans- 
formations. This relation is defined, for a given PRS R, as the smallest relation 
over Tp which contains (r (since ^ includes ~s) and satisfies: 

(Bl) For every rule X\\Y — >■ t in i?, if there exist a paral-context C and two null- 
contexts Co and Cg such that t = C [Cq [X] , C'q [y]] and t' = C [Cq [ 6] , Cg [t]] , 
then G pR. 

(B2) For every rule X ■ Y — >■ t in i?, if there exist a seq-context C, and a null- 
context Cg such that ti = ■{Cq[X],C\Y\), and t 2 = -(Cop], C[t]), then 
(C, ^2) G pR. 



Proposition 2. For every PRS R and every language L, Post*j^^{L) = 
[pUL)]~ and Pre*n,^{L) = [p*j^_^{L)]^. 

6.2 Regular Analysis of PAD Systems 

Let i? be a PAD system. Then, given a language L, it is easy to see that we 
have P*r{L) = Cr{L)- This is due to the fact that PAD systems do not contain 
rules of the form X\\Y — >■ t. Therefore, by Proposition 2, Post*j^^ images can 
be computed using the construction of Section 5. Moreover, if we assume that 
L is ~||-compatible, then p^_i(L) = This is due to the fact that for 

PAD systems, the rules of R~^ do not create terms with the || operator, and 
thus, their application preserves ~|| -compatibility. Therefore, by Proposition 2, 
representatives of Post*j^ ^ images of regular sets, as well as representatives of 
Pre*j^^ images of ~|| -compatible sets, can be computed using the polynomial 
construction of Section 5. 

Theorem 5. Let R be a PAD system. Then, for every regular (resp. regular 
^Incompatible) tree language L, a regular ^-representative of the set Post*j^ ^{L) 
(resp. Pre)^ ^{L)) can be effectively constructed in polynomial time. 

6.3 Nonregular Backward Reachability Analysis of PAD Systems 

We consider now the remaining problem of computing ^-representatives of 
Pre*p, ^ images starting from any regular set. We show that we can compute 
in this case for any given regular set L, a 0-CTA representing p^_i(L), (which 
is, by Proposition 2, a ^-representative of Pre)^ ^{L)). This allows to perform 
the backward reachability analysis of PADs (using Theorem 1 and Lemma 1). 

Theorem 6. Let L be a regular language, and let R be a PAD, then p^_i(L) 
can be effectively characterized by a 0-CTA. Furthermore, from an automaton 
with k states and r transitions that recognizes L, it is possible to construct an 
automaton accepting p)^_i{L), with 0(2^^ -iVarl"^ ■ {k-\-\Subr{R~^)\)) states, 
and 0(21^“’'!' • \Var\^ ■ {k + \Subr{R~^)\) ■ \Subr{R~^)\ + k ■ \Var\^ ■ 4 !^“’'!') 
transitions. 
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Proof (Sketch): For the sake of clarity, we consider here only the most signif- 
icant rules of R~^ , which are of the form X\\Y t and apply them only nonlo- 
cally, i.e. as described in (Bl). Let u and u' be two terms such that u' G p)j-i (u), 
and s.t. there exist a paral-context C, n process variables Ai,... ,A„, and n 
terms ti, . . . ,tn such that u = C[Ai , . . . , An], and u' = C[ti , ... , t„]. The term 
u' is obtained from u after several rewritings on the leaves as follows: There ex- 
ist intermediate terms uq, . . . ,Uk, and a sequence (ii,ji), . . . , (ik,jk) of pairs of 
indices in {1, . . . ,n} not necessarily distinct but satisfying the fact that b yf ji 
for each I < k, such that mq = u, Uk = v! , and ui+i is a successor of ui by pR-i 
obtained as follows: If ui is of the form ui = ... , s„], where the s^’s are 

terms, then there exists in R~^ a rule of the form —1 t (or — >■ t) 

such that Si, = Xi, Sj^ = Yi, and ui+\ = C[s'i , ... , sjj] with s(^ = 0, s'^ = t, and 
s' = Si for all the other indices. This means that ui+i is obtained by applying 
the rule X[\\Yi — >• t (or Yi\\X[ — >• t) at the positions (ii,ji) in the term U[. Ob- 
serve that t is either equal to some of the ty, ’s appearing in u' , or it is equal to a 
process variable B that will be rewritten later on a step V > I (this means that 
there exists a rule B\\B' — >• s (or B'\\B s) that can be applied at the positions 
where ji is either equal to ii' or to jii. 

The automaton has to recognize the term u' as a successor of u. For that, 
it starts from the leaves and proceeds upwards to the root. At each position ii 
(resp. ji), it guesses that the current node has, roughly speaking, “interacted” 
with the node ji (resp. ii) as described above. The automaton has to memorize 
all these guesses and validate them when it reaches the root of u' . The problem 
is that the number of these guesses is not bounded for all possible terms. Let 
us show how does the automaton behave to memorize all these guesses: First, 
the automaton has a counter cx for each process variable X in Var. When 
it guesses that at the positions a rule X[\\Yi — >• t (or Yi\\Xi — >• t) has 

been applied, the automaton decrements the counter cx, at position ii, and 
increments this counter at position j/. The idea is that, if the guesses are valid, 
the counters must be null at the root of the term u' . But this is not sufficient 
because the order in which rules are applicable is important. So, in addition, the 
automaton memorizes at each position p G {1, ■ . • ,n} a graph G whose edges 
are in Var LI {T} x Var such that: (1) T — >• A; is in G iff p = j; and Up yf 0, 
and (2) Xi — >• Yj/ is in G iff p = q = jr (this means that Up = 0). 

After performing the guesses at the leaves, the automaton percolates 
them to the root by decorating the inner nodes as follows: if the term v\ 
(resp. V 2 ) is decorated with the graph Gi (resp. G 2 ) and the value Ci of the 
counters (resp. C 2 ) (we consider the vector of counters c = (cy,,... ,cy„), 
where Var = {Vi,... ,Vm}), then ||(r'i,U 2 ) is decorated with Gi U G 2 and 
c = Cl -I- C 2 . Note that there is a finite number of these graphs (there are 
at most possible graphs). The key point of the construction is that 

we can prove that the guesses that are performed by the automaton at 
the leaves are correct if and only if the automaton reaches the root of the 
tree in a configuration where all the counters are null, and the graph G satis- 
fies the property: every vertex appearing in G is reachable from the vertex T. □ 
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7 Conclusion 

We have investigated the problem of computing reachability sets of Process 
Rewrite Systems. We have considered PRSs with different operational semantics 
induced by several structural equivalences on terms. These equivalences corre- 
spond to the consideration of different combinations of properties of the used 
operators (associativity, commutativity, neutral element). We have shown that 
this allows to extend and unify the automata-based approaches developed in 
[BEM97,LS98,EHRS00,EP00] for pushdown and PA systems to more general 
classes of models. 

In the full paper, we provide a translation from parallel programs, given 
as parallel flow graph systems, to PRS models. Through this translation, we 
show that our results for computing post* and pre* images for PRS modulo 
(Proposition 1 and Theorem 4) can be used to analyze precisely a significant 
class of parallel programs with recursive calls (and of course a restricted policy 
of communication). 

Moreover, we provide an abstract translation from programs to PAD models 
which consists in abstracting away synchronizations, but keeping all informa- 
tions about dynamic creation of processes, recursive calls, and return values of 
procedures. This translation refines the one given in [EPOO] where return values 
of procedures are ignored. Therefore, through our PAD-based abstract seman- 
tics, our results of Theorems 5 and 6 can be used for a conservative analysis of 
parallel programs with procedure calls. 
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Abstract. We consider infinitary two-player perfect information games 
defined over graphs of configurations of a pushdown automaton. We show 
how to solve such games when winning conditions are Boolean combi- 
nations of a Biichi condition and a new condition that we call unbound- 
edness. An infinite play satisfies the unboundedness condition if there 
is no bound on the size of the stack during the play. We show that the 
problem of deciding a winner in such games is EXPTIME-complete. 



1 Introduction 

Infinite two-player games with perfect information are one of the central notions 
in verification and in theory of automata on infinite words and trees. The result 
on existence of finite memory strategies for games with Muller conditions is 
a necessary ingredient of most automata constructions [13,15,17]. The other 
important results are those describing ways of solving a game, i.e., finding out 
from which vertices a given player has a winning strategy [16,20]. The mu- 
calculus model checking problem is an instance of a game solving problem [9,8, 
7]. The construction of discrete controllers can be also reduced to the problem 
of solving games [1]. 

In the most standard setting of verification and synthesis one uses just finite 
games. Still the model of pushdown games has attracted some attention [12,2,10, 
11,4,5,14,18]. In this model a graph of a game is given by a configuration graph 
of a pushdown automaton. Such games are more suitable to model phenomena 
like procedure invocation because stack is explicitly present in the model. 

Standard, Muller or parity winning conditions, are very useful and natural 
for the applications mentioned above. Their expressiveness is also satisfactory 
as any game with SIS definable (i.e. regular) winning conditions can be reduced 
to a game with Muller or parity conditions. As noted in [6] for pushdown games 
the situation changes and there exists “natural winning conditions exploiting 
the infinity of pushdown transition graphs” . 

We propose a new winning condition for pushdown games that we call un- 
boundedness: an infinite play satisfies the unboundedness condition if there is no 
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bound on the size of the stack during the play. We consider Boolean combinations 
of this condition and the parity condition, for example, a condition saying that 
a stack is unbounded and some state appears infinitely often. We characterize 
conditions for which there is a strategy with finite memory for both players. We 
show that the problem of deciding a winner in pushdown games with Boolean 
combinations of Biichi and unboundedness conditions is EXPTIME-complete (in 
the size of the automaton defining the game graph) . 

Research reported here was motivated by the paper of Cachat, Duparc and 
Thomas [6]. They consider the same class of games as we do here, but only a 
single winning condition: some configuration repeats infinitely often on the play. 
The negation of this condition is “strict stack unboundedness”: every config- 
uration appears only finitely often on the play. While “strict unboundedness” 
is a more restrictive condition than unboundedness, we show that the two are 
equivalent if considered in disjunction with a parity condition. In particular, in 
a pushdown game, a position is winning with unboundedness condition if and 
only if it is winning with strict unboundedness condition. 

As mentioned above, numerous verification and synthesis problems are re- 
ducible to the problem of solving games. Hence, the algorithms that we propose 
here can be used to extend the class of properties that can be model checked or 
for which synthesis is possible. To give a simple example, our algorithms can be 
used to solve the problem of checking that on every path of a given pushdown 
system where the stack is unbounded some LTL property holds. 

In summary we show the following results. (1) For every Boolean combi- 
nation of conditions “states from a given set appear infinitely often” and “un- 
boundedness” , there is an EXPTIME-algorithm deciding who is the winner in 
a given configuration (Theorem 1). (2) For the conditions of the form “parity 
or unboundedness” from every configuration one of the players has a memory- 
less winning strategy (Theorem 3). (3) In the games with the condition “states 
from a given set appear infinitely often” and “unboundedness” player 0 may 
need infinite memory in order to win. Hence it is a rare case of a game which is 
algorithmically tractable but does not admit finite memory strategies (Example 
on page 97). 

Due to the page limit we have decided to put just representative fragments 
of the proofs into this paper. The complete proofs can be found in [3]. The proof 
methods in all the cases are reductions to finite graphs of exponential size (in 
the size of the pushdown system) but with a constant number of colors [18, 
19]. It is not evident how to use the elegant method from [11] if only because 
of non-existence of memoryless strategies as demonstrated in the example on 
page 97. 

2 Definitions 

Infinite two-player games. An infinite two-player game on a finite or infinite 
graph (U, E) is a tuple G = {V, Vq, Vi,E, Ace C V'^) where {Vq, Vi) is a partition 
of V. The set of vertices Vq describes the positions for player 0, and Vi those 
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for player 1. Whereas Acc defines the infinitary winning condition. In figures, we 
will denote positions of player 0 by ovals and those of player 1 by squares. 

Two players, player 0 and player 1, play on G by moving a token between 
vertices. A play from an initial position po G Vo proceeds as follows : player 0 
moves the token to a new position pi ; then the player to whom pi belongs, makes 
a move reaching p 2 and so on. Similarly, we define a play starting in Vi, where 
player 1 begins. If one of the players cannot make a move, the other player wins. 
Otherwise, the play is infinite and results in an infinite path p = PoPiP 2 ■ . ■ G V‘^ 
in the game graph. Player 0 wins if p G Acc, otherwise player 1 is the winner. 

Pushdown systems. A pushdown system is a tuple A = {Q, P, A) where Q is a 
finite set of states, T is a finite set of stack symbols, and 

A : Q X r ^ 'P{{pop{q),push{q, b) : q e Q,b e P}) 

is the transition relation. A configuration of ^ is a pair {q, u) with q € Q and 
■u G r*. It denotes a global state of the pushdown system which consists of a 
control state and a contents of the stack; the top of the stack is described by 
the first letter of the word u. A pushdown system A defines an infinite graph, 
Gr{A) called pushdown graph whose nodes are the configurations of A and the 
edges are defined by the transitions, i.e., from a node {p, au) we have edges to: 

{q,bau) whenever push{q,b) € A(p,a); 

(q,u) whenever pop{q) € A(p,a). 

Observe that any configuration with an empty stack has no successors. The 
degree of every node is finite and bounded by |Q|(1 + |T|). 

Pushdown games. We have mentioned above that a pushdown system A = 
{Q, r, A) defines a pushdown graph Gr{A) = {V, E). Now suppose that we have 
a partition (Qo, Qi) of Q and an acceptance condition Acc C These allow to 
define a pushdown game (P, Vq, V\,E, Acc) where: Vq = {(p> u) : p £ Qo, u G E*} 
and Vi = {{p,u) : p G Qi,u G E*} 

Strategies, winning positions and determinacy. A strategy for player 0 is a func- 
tion a : V*Vo ^ V assigning to every partial play p = poPi ■ ■ - Pn ending in a 
vertex from Vq a vertex a(p) G V such that E{pn, cr{p)) holds. A strategy for 
player 0 is called memoryless if it depends only on the current position, i.e.: for 
every v G Vq, and every p,q £ V* we have a{pv) = <j{qv). A play p = PoPi ■ ■ ■ 
respects a strategy cr for player 0 if whenever pi £Vq, then Pi+i = cr(poPi ■ • - Pi)- 
In this case player 0 is said to follow the strategy cr, during the play p. A strat- 
egy a is winning for player 0 from a position pq if all the plays beginning in pq 
and respecting a are winning. A position pq is winning for player 0 if he has a 
winning strategy from pq. By Wq{G) we denote the set of winning positions for 
player 0 in the game G. Similarly we define strategies and the set W\{G) of the 
winning positions for player 1. 

Let CT be a memoryless strategy for player 0 in an infinite two-player game 
G = {V,Vq,V\,E,Acc). Strategy a defines a game where only player 1 plays: 
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G(cr) = {V ,Vq,V{,E' ,A cc). The set of vertices V is the set of positions from 
V where cr is winning in G, Vq = Vod V', = VidV and the edge relation 

is defined by E' = fl {V{ x V')^ U {(w,wO : v G Vq,v' = cr(u)}. Note that all 
the plays in G(cr) are winning for player 0. 

A strategy with a memory M is a tuple a = {tp, up, mo) where : ip : M xV ^ V, 
up : M X V ^ M, mo G M. Intuitively mo is an initial state of the memory, 
up is a function that updates the memory according to the moves played, and p 
is a function giving the next move depending on the memory of the play so far 
and the current position. 

The memory update function is extended to sequences of positions as fol- 
lows: up*{m,e) = m and up*{m,ppi) = up{up*{m,p),pi). Thus a strategy with 
memory a is defined by : a{ppi) = p{up* (jno,ppi),Pi). 

Winning conditions. In this paper we consider winning conditions that are be 
boolean combinations of parity and unboundedness conditions. The unbound- 
edness winning condition Accjj says that there is no bound on the size of the 
stack during the play. A parity winning condition Accq depends on a color- 
ing function 17 : Q — >■ {0 . . . d} extended to positions of pushdown games by 
Q{{q,u)) = 17(g). An infinite path v is winning for player 0 if in the sequence 
f2(v) = l7(-uo)l7(ui) . . . the smallest color appearing infinitely often is even. 
Formally, we have : 

Accu ={p = (po,uo)(pi,ui) ■ ■ ■ S V‘^ : limsupluil = oo} 

z— >-oo 

Acco ={p = (po,uo)(pi,ui) ... € V‘^ : liminf f2(pi) is even} 

i—^oo 

Conditional pushdown games. From a pushdown game G we can define for any 
subset R C Q a, conditional game G{R) where the winning plays for player 0 and 
player 1 are the same as those in G, except for the plays reaching a configuration 
of the form {q,e) (i.e. a configuration with the empty stack). A play reaching 
(g, e) is declared winning for player 0 in G(i?) iff g G i?. 

3 Solving Games: Boolean Combinations of Biichi and 
Unboundedness 

Consider a pushdown game G defined by a pushdown system {Q,E, A) and a 
partition of states {Qo,Qi). We assume here that we have a priority function 
17 : Q — >■ {0, 1}. The conditions we are interested in are in one of the four forms: 

AccoUAcci/ AccoUAcci/ AccqIIAccu Accf2 d Acc/j (1) 

where Accq stands for the complement of Ace {2 and similarly for Acci/. These 
four cases cover all possible Boolean combinations of Acer? and Acc/j as, for 
example, the condition AccodAccu = Acco U Accu is just the winning condition 
for player 1 in the game with condition Accq U Accu. The main result of the 
paper is: 
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Theorem 1. The problem of deciding a winner from a given position in a given 
pushdown game with conditions as in (1) is EXPTIME-complete. 

4 Biichi Union Unboundedness 

We present a solution for pushdown games with a condition Accjj U Accs2 where 
the range of f2 is {0,1}. Given a pushdown game G with states labelled with 
priorities 0 and 1 we want to decide whether player 0 has a strategy to make 
the stack unbounded or see states labelled 0 infinitely often. To this end we 
construct a finite-state Biichi game G with coloring on edges as follows : 



(p,a,Ro,Ri) 




1 



CM{p, a, i?o, Ri,q, b, Sq, ^i) 




((/, 6, tSo ) Rq^ Ro^ (-^l 5 a, .Rq 7 .^1 ) 

The positions of the game are as in the picture for every p,q € Q, a,b € E 
and Ro, Ri, So, Si C Q. From {p,a,Ro,R\) the edge to tt exists when we have 
pop{q) € A(p,a) and q € Rn(g)- The edge to ff exists when we have pop{q) G 
A{p,a) and q ^ Rn(q)- The edge to GS{p,a, Rq, Ri, q,b) exists when we have 
push{q,b) G A{p,a). From a position GS{p,a,Ro,R\,q,b) there is an edge to 
GM{p,a,Ro,R\,q,b,So,Si) for every S'07'S'i C Q. From a position of the form 
GM{p, a, Ro, Ri,q, b, So, Si) there is an edge, called push edge to {q, b. So, Sf2(q)), 
and edges to {so,a, Rq, Ro) and (si, a, R07 -Ri) for every sq G So and every si G 
Si, respectively. The positions for player 0 are all the (p, a, Rq, Ri) where p G Q07 
all the positions of the form GS{. . . ) and the position ff . The other positions 
are for player 1. ^ 

Intuitively, a play in G gives a compact description of a play in G. A node 
(p,a, Ro, Ri) stands for a position {p,au) in G where we require that if a is 
popped then player 0 wins only if the control state reached when popping a is 
in Ro or Ri depending whether a configuration of priority 0 was visited in the 
meantime. This explains the edges to tt and ff . If in G there is a push{q, b) 
move from (p, au) then player 0 has to determine the new returning sets Rg and 
R'l associated with the configuration (q,bau), that is the control states that he 
considers “safe” when b is eventually popped. This choice can be considered as 
a choice of a strategy. Now, player 1 can agree that the chosen vertices are safe 
and allow to play the push move. Otherwise he can challenge the choice and pick 
a state q' from Rg or Ri . This corresponds to a non push edge. _ 

The following theorem shows the correspondence between G and G. Note 
that the connection uses conditional pushdown games. 
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Theorem 2. For an unboundedness union Biichi winning condition, for every 
ae r, qeQ and RCQ: {q, a) G Wo{G{R)) iff {q, a, R, R) G Wo{G) 

This theorem provides an algorithm to decide whether a configuration of the 
form (q, a) is winning in a conditional pushdown game. Using methods from [14] 
we can obtain a description of all the winning positions. The proof of the theorem 
will follow from Lemmas 3 and 6. 



4.1 Reciprocal Implication 

We take a memoryless winning strategy ao for player 0 in G. We assume that 
the strategy is winning from every winning vertex for player 0. We will construct 
a winning strategy for player 0 in G. Every path in G(cto) is winning. Hence for 
a position tt there is a finite number of edges of priority 1 that can be reached 
from 7T without taking an edge labeled by 0. Let be this number. 

We start with a technical definition which describes what game positions 
{q, b, So, S'!) can appear after a given position tt. 

Definition 1. We say that a pair of sets of states {So, Si) is selected for a 
position TT = {p, a, Ro, Ri) and q € Q, b € F, denoted {So,S\) G sel{'K,q,b) if 
there exist q' G Q, S[,Si C Q such that in G{ao) there is a sequence: 



TT -)> GS{p,a,Ro,Ri,q',b) -)> CM{p,a, Ro, Ri,q' ,b. So, S[) -)> {q',b,So,Si) 



and {q,b. So, Si) is reachable from {q' ,b. So, S") without passing through a push 
edge. A pair {So, Si) is 1-selected, denoted {So, Si) G seli{Tr,q,b), if we have a 
sequence as above with Fi{q') = 1 and {q,b. So, Si) reachable from {q' ,b, So, S'f) 
without passing through 0-labelled edges. 

The winning strategy ctq provides a compact description of a strategy in G. 
If (Jo suggests a push move then in G the same move can be taken. If it suggest 
a move to tt, then player 0 has to ’’reconstruct” the play and for this he needs 
returning sets and a stack as an extra memory. The memory consists of the 
sequences of nodes from G. The initial memory is the sequence with one element 
{q,a,R, R). 

Definition 2. We define a strategy uo using uo as follows. Suppose that the 
current position in the play in G is (g„,a„...ai) and the current memory is 
TO = 7r„ . . . 7Ti where tti = {qi, ai, Ro, R\) for i = 1 . . .n. 

— If in G{ao) there is a sequence: 



GS{q„ 



Jn,RtRi,P, c) ^ GM{q„,an,R^,R^,p, c,So,Si) 



then we define 



up{m, {p, can... ai)) = {p, c, So, S'j7(p))7r„ ...tti 
M oreover if qn G Qo then we put: 



(To(to, (g„,a„...oi)) = (p, ca„...ai) 
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~ If in G(cto) there is edge tt„ — >■ tt then, for every p € -R^(p) such that 
pop{p) € 5(g„,a„) we have 

Up{m, {p, 0„_l . . . Ol)) = 7T^_i . . . 7Ti 

where we set 7r^_i = (p, a„_i , if 0{p) = 1 and G 

seli(7T„_i, < 7 „, a„). Otherwise we put Ti'n-i = (P) Qn-i; More- 
over if Qn G Qo then we take some p G R'o(p) ^(p) = 0 if possible, and 
define: 

o-o{rn, {qn, a„ . . . ai)) = {p, a„_i . . . oi) 

For all other cases the update function and the strategy function are undefined. 

The next definition and lemma state the invariants that are true when player 

0 follows the strategy (Tq- 

Definition 3. Consider a memory m = 7r„ . . . tti where = (g^, Oj, Rf, R\), for 

1 = 1 . . . n. We say that m is consistent if all TTj are positions of G(cto) and for 

all i = 1, . . . ,n — 1 we have G sel{TTi, Oi+i). We say that m is 

proper z/(i?g,i?") G se/i(7r„_i, g„, a„) . The height of m is h{m) = h{TT„). The 
size of m, denoted \m\ is n, i.e., the number of elements in m. We denote by 
tail{m) the memory 7r„_i . . . tti . 

Lemma 1. Suppose that while playing according to strategy CTq a position of the 
form (g„,a„...ai) is reached with a consistent memory m = 7r„...7ri where 
^i) The next move in G{ao) is to one of the 

following positions: 

{p, can ■ ■ ■ oi)- The updated memory is m' = {p, c, Sq, SQ(^p))iTn . . . tti and it is 
consistent. Moreover if 12{p) = 1 then m' is proper. 

(p, a„_i . . . ai). The updated memory becomes m' = (p, a„_i, R^~^,R) 7t„_ 2 
...TTi and m' is consistent. Moreover we have h{m') < h{tail{m)). If m 
is proper and f?(p) = 1 then h{m') < h{tail{m)), in addition if tail {m) is 
proper then m' is proper. 

Lemma 2. Consider two positions i < j such that \mi\ = \mj\ and such that 
\mk\ > \mi\ for all i < k < j. We have that h{mi) > h{mj). If only states of 
priority 1 appear between i and j then we have that h{mi) > h{rrij). 

Lemma 3. The strategy CTo is winning. 

Proof. To show that (Jq is winning, consider a play (qi,ui), (q 2 ,U 2 ), . . . respecting 
(7o and let mi, m 2 , . . . , be the associated memories. 

Assume that the stack is bounded. As, by Lemma 1, the size of memory 
at a position is the same as the size of stack at this position, we have that 
there is a size of memory that is infinitely repeated. Let N be the smallest size 
infinitely repeated. Therefore, there is a sequence of positions ii, 12 , . . . such that 
\mij I = N for all index ij and such that |mfc| > N for all fc > i\. By contradiction 
assume that only states of priority 1 appear infinitely often. Therefore, there is 
some k such that after ik only states colored by 1 are visited. By Lemma 2, 
we have that h{mi^) > h{mi^.^.^.^) for all j > k. As < is well-founded, it is not 
possible. Therefore the Biichi condition is satisfied. 
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4.2 Direct Implication 

Let ai be a memoryless strategy for player 1 in G. We assume that a\ is winning 
from all the positions winning for player 1. We translate ai into a strategy with 
memory for player 1 in G. The memory consists of the sequences of nodes from 
G. The initial memory is the sequence with one element {q,a, R, R). 

Every path in G(cti) is winning for player 1, hence for every position tt in it, 
there is a finite number of positions of priority 0 that can be reached. Let 
be this number. For a position tt = (p, a, Rq,Ri) in G we define two sets: 

To{tt) ={q ■■ h{{q,a,Ro,Ro)) < Hn)} 

={q ■■ h{{q,a,RQ,Ri)) < h{Tr)} 

Definition 4. We define a strategy Ui using ?i as follows. Suppose that the 
current position of the play in G is (qn,a„u) and the current state of memory is 
m = 7T„ . . . 7Ti where tt, = (gj, Oj, Rf, R\) for i = 1 . . .n. 

— If the move from 7t„ to GS{qn, a„, i?Q , Rf,p, c) is possible in G(cti) then we 
set 



up{m, {p, cbu)) = {p, c, So, S'r2(p))7T„ . . . tti 

where So = Q\To{tt„) and S\ = <5\ri(7r„) (note that fl{p) G {0, 1}J. 

U Qn G Qi then additionally we define: 

ai{m, (g„, Onu)) = {p, ca„u) 

— If the move from iTn to ff is possible in G(cti) then for p such that p ^ ^72{p) 
and pop{p) G 6{qn,an) we set: 

Up{m, (p,u)) = . . .TTl 

where R = Ro~^ if P G To(7t„_i), and R = otherwise. 

If Qn G Qi then we take p with 7r^_i of the first kind if possible, if not then 
of the second kind and define 

(qn,bu)) = (p,u) 

For all other cases the update function and the strategy function are not defined. 

Definition 5. Consider a memory m = 7r„...7ri where tt^ = {qi,Oi, R(, R\) 
for i = 1 . . .n. We say that m is consistent if all iTi are positions of G(cti) and 
= Q\To{7ri) and = Q\To(7rj) or = (5\Ti(7Ti) for i = I, . . . ,n-l. 
We say that m is glued if Rq = Rf. The height of m is h{m) = fi(7r„). The size 
of m, denoted \m\ is n, i.e., the number of elements in m. We denote by tail{m) 
the memory 7t„_i . . . tti . 

Lemma 4. Suppose that when following the strategy a\ a position {qn, a„ . . . oi) 
is reached with a consistent memory m = 7 t„ . . . tti where tt^ = {qi,ai,Rf,R\), 
for i = 1 . . . n. The next move in G(cti) is to one of the following positions: 
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(p, cttn ■ ■ ■ ai). The updated memory is m' = (p, c, 5'o, S'j7(p))7r„ . . . tti and it is 
eonsistent. We have that h{m') < h{m). In addition, if 12 (p) = 0 then m' is 
glued. 

{p, a„_i . . . ai). The updated memory is m' = {p, a„_i, Rq~^, R)'Kn -2 ■ • ■ tti and 
it is consistent. Moreover, ifm is glued or f2(j>) = 0, then h{m') < h{tail{m)) 
and m' is glued. Otherwise, we have that h{m') < h{tail{m)). 

Lemma 5. Consider two positions i < j such that \mi\ = \mj\ and such that 
\mk\ > \mi\ for all i < k < j. We have that h{mi) > h{rrij). Moreover if a state 
of priority 0 appears between i + 1 and j then h{mi) > h{mj) and rrij is glued. 

Lemma 6. Strategy ai is winning. 

Proof. To show that a\ is winning consider a play (gi, mi), (^ 2 , ^ 2 ), • ■ • respecting 
(Ti and let mi, m 2 , • ■ • , be the associated memories. Using Lemma 4 it follows 
that for every i < j, if \mi\ < \mk\ for all i < k < j and \mi\ < \mj\ then 
h{mi) > h{mj). As by Lemma 4 the size of the stack at a position is the same 
as the size of memory at this position we conclude that the size of the stack 
is bounded; and there is a size of memory that is infinitely repeated. Let N be 
the smallest size infinitely repeated. Therefore, there is a sequence of positions 
ii,Z 2 ,... such that \mi.\ = N for all index ij and such that \mk\ > N for all 
k > i\. From Lemma 5 for every /c > 1, if 0 appears between tfe + 1 and i(^k+i) 
then h{mi^f) > , and we have h{mi^) > otherwise. Therefore 

0 cannot appear infinitely often during the play respecting a\. 

The strategies we obtain from the above proof are not memory less. It turns 
out that they can be memoryless. Let us state the following more general theorem 
proved together with Hugo Gimbert. 

Theorem 3. In a game with a Accu U Accq winning condition, from each po- 
sition one of the players has a memoryless winning strategy. Moreover, the win- 
ning memoryless strategy for player 0 guarantees that no configuration appears 
infinitely often on the play or the parity condition is satisfied. 

In particular the theorem says that unboundedness and strict unboundedness 
conditions are equivalent. 

5 Biichi and Unboundedness 

In this section we consider pushdown games with the condition: “green states 
appear infinitely often and the stack is unbounded” . Formally, we have a priority 
function S2 assigning to the states of the game priorities 0 or 1. The set of winning 
paths is: 

Acc{f2 AU) = Acca fl Accu 

Fix a pushdown game G defined by a pushdown system {Q, T, A), a partition 
of states {Qa,Qi) and a priority function Q : Q ^ {0: !}• We will solve this 
game by reducing it to a finite game. The following example hints why we do 
the reduction in two steps. The example shows that in games with Acc(I2 A U) 
condition, all the strategies for player 0 may need infinite memory. 
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Example 1. Consider the pushdown system A = {Q,E,A) where : Q = {p,q}, 
r = {a, _L} and 

^ = {{P,-L,PUsh{q, a)), {q,a,push{q,a)), (q,a,pop{q)), (q,a,pop{p))} 

From state p on letter _L the automaton pushes a and goes to state q. Then it 
can push and pop letters a until it decides to change state back to p. If it arrives 
at p with a stack other than _L, it is blocked. 

We assume that all the states belong to player 0, i.e., Qo = Q and Qi = 0- 
We take the Biichi condition I7(p) = 0 and Ei{q) = 1. Thus Acc{Q A U) is the 
set of plays where the stack is unbounded and state p is visited infinitely often, 
which in this case means that the stack must contain only T infinitely often. 
Therefore in order to win player 0 needs to go from the configuration (p, T) to 
a configuration (g, a*T) and back to (p, _L) repeatedly for bigger and bigger i. 
It is easy to see that any such strategy requires infinite memory as being in the 
configuration (p, T) player 0 needs to memorize which height he wants to reach 
next. 

Finally note that in the same pushdown game but with strict unboundedness 
condition instead of unboundedness, player 0 has no winning strategy. 

Because of the complications presented in the example, the first step in solv- 
ing G is to reduce it to another pushdown game G* as depicted below. This 
game has two modes so as to let player 1 to verify that player 0 can win the play 
in mode B with the Biichi condition or in mode U with the unboundedness con- 
dition. Only Player 1 can change the mode and he can do it anytime. However 
if he changes it infinitely often, he looses. 




U (p, au) 




I 



B7{q, ban) 



B7{q',u) 



Formally, for every mode K G {B,U} and for every configuration (p,u) in 
the pushdown game G, there is in G* an edge from K7{p,u) to K{p,u) and to 
K{p,u), where K denotes the “other letter”. For every edge from (p, u) to {q,v) 
in G, there is an edge in G* from K{p,u) to K7{q,v). 

All the positions with an interrogative key K7 are for player 1. A position 
AT(p, u) is for player 0 in G* if and only if (p, u) is for player 0 in G. 

The winning condition consists of the sequences of positions in the graph of 
the following form ATi?(pi, mi)AT 2 (pi, ■wi)AT 2 ?(p 2 , U 2 ) • ■ • such that either 

— there is finitely many visits of U mode and (pi, rti)(p 2 , U 2 ) • • • G Acc{f7); or 
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— there is finitely many visits of B mode and (pi, t6i)(p2, M2) • • • G Acc{U); or 

— there are infinitely many i such that Ki = B and ATi+i = U . 

The following theorem states the correctness of the reduction. 

Theorem 4. Player 0 has a winning strategy from a position (p, u) in G iff he 
has a winning strategy from positions B1 (p, u) and U1 (p, u) in G* . 

The game G* can be reduced to a finite game G in a similar way as we did in 
the previous section [3]. 

6 Conclusions 

In the framework of pushdown games we have introduced a new condition ex- 
pressing the fact that the stack is unbounded. This condition is similar to, and in 
our opinion more natural than, the strict unboundedness condition from the pa- 
per of Cachat, Duparc and Thomas [6]. We have shown nevertheless that player 
0 has a winning strategy with one condition iff he has one with the other. This 
property extends to the case when these conditions are considered in union with 
a parity condition. It stops being true for conditions of the form Ace a fl Accu- 
We have proved that the problem of solving a pushdown game with boolean 
combinations of Biichi and unboundedness conditions is EXPTIME-complete. 
Unfortunately, we were able to fit just parts of proofs of two cases into the page 
limit. The complete proofs can be found in [3]. 

The proofs give strategies that are implementable using a stack. This is useful 
as such strategies are finitely described, and could be used for instance to define 
a controller. 

We have given methods to decide the winner from a given position, but only 
from one that has just one letter on the stack. One may be interested in having 
a uniform version, that is a full description of the set of winning positions. Using 
techniques of [14] one deduces from our algorithm an alternating automaton 
recognizing the set of winning positions, which is in fact a regular language. 
One may also note that this alternating automaton gives a method to define 
strategies using a stack for any winning position. 

Finally, let us comment on the restriction to Biichi and co-Biichi winning 
conditions. We believe that the method presented here works for all parity con- 
ditions and we hope to include the proof in the journal version of this paper. 
Nevertheless we think that the Biichi/co-Biichi case is sufficiently interesting as 
these kinds of conditions are enough to encode LTL and CTL properties. 
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Abstract. In this paper^, we study the model-checking and parame- 
ter synthesis problems of the logic TCTL over discrete-timed automata 
where parameters are allowed both in the model and in the property. We 
show that the model-checking problem of TCTL extended with param- 
eters is undecidable over discrete-timed automata with only one para- 
metric clock. The undecidability result needs equality in the logic. When 
equality is not allowed, we show that the model-checking and the param- 
eter synthesis problems become decidable. 



1 Introduction 

In recent works, parametric real-time model-checking problems have been stud- 
ied by several authors. Alur et al study in [1] the analysis of timed automata 
where clocks are compared to parameters. They show that when only one clock 
is compared to parameters, the emptiness problem is decidable. But this prob- 
lem becomes undecidable when three clocks are compared to parameters. Hune 
et al study in [6] a subclass of parametric timed automata (L/U automata) such 
that each parameter occurs either as a lower bound or as an upper bound. Wang 
in [7,8], Emerson et al in [5], Alur et al in [2] and the authors of this paper in [4] 
study the introduction of parameters in temporal logics. The model-checking 
problem for TCTL extended with parameters over timed automata (without 
parameters) is decidable. On the other hand, only a fragment of LTL extended 
with parameters is decidable. 

Unfortunately, in all those previous works, the parameters are only in the 
model (timed automaton) or only in the property (temporal formula). Here, we 
study the model-checking problem and the parameter synthesis problem for the 
logic TCTL extended with parameters and for discrete-timed automata with one 
parametric clock. To the best of our knowledge, this is the first work studying 
these problems with parameters both in the model and in the property. 

Let us illustrate this on an example. The automaton A of Figure 1 is a 
discrete-timed automaton with one clock x and two parameters 0\ and 02- Here 
we explicitly model the elapse of time by transitions labeled by 0 or 1 . State qo is 
labeled with atomic proposition cr and in all other states this proposition is false. 
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Fig. 1. A parametric timed automaton 



Let us consider three properties of the runs of A starting at go with clock value 
X = 0, that are expressed by the next formulae of TCTL logic with parameters: 

(i) Vn((T -1 VO<e3cr) 

(ii) V0iV6<2 • (02 < 01 VO(a —)■ VO<26i+20-)) 

(Hi) V0i ■ (01 ^ 5 —> VO(cr —> VO<26 >i+ 20')) 

The parameter synthesis problem associated to formula (i), asks for which values 
of 6*1,02 and 03, the formula is true at configuration (<7o,0). By observing that 
any cycle through the four states has a duration bounded by 0i -h 02 + 2, we can 
deduce the following constraint on the parameters: 03 > 0i + 02 + 2. Formula 
(ii) formalizes the next question “Whenever the value assigned to 0i is greater 
than the value assigned to 02, is it true that any cycle has a duration bounded 
by 201 + 2” . As there is no free parameter in the question, the question has 
a YES-NO answer. This is a model- checking problem, with a yes answer here. 
Finally, formula (Hi) lets parameter 02 free and formalizes the question “What 
are the possible values of 02 such that for any value of 0i > 5, any cycle lasts at 
most 201 -|- 1 time units” . This is again a parameter synthesis problem and the 
answer is the constraint 02 < 4. 

In this paper, we study the algorithmic treatment of such problems. Our 
results are as follows. On the negative side, we show that the model-checking 
problem of TCTL extended with parameters is undecidable over timed automata 
with only one parametric clock. The undecidability result needs equality in the 
logic. On the positive side, we show that when equality is not allowed in the 
logic, the model-checking problem becomes decidable and the parameter synthe- 
sis problem is solvable. Our algorithm is based on automata theoretic principles 
and an extension of our method (see [4]) to express durations of runs in a timed 
automaton using Presburger arithmetic. As a corollary, we obtain the decidabil- 
ity of the reachability problem for timed automata with one parametric clock 
proved by Alur et al in [1]. All the formulae given in the example above are in 
the decidable fragment. 
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With the Presburger approach, we clearly indicate the borderline between 
decidability and undecidability. Future works could be the following ones. Pres- 
burger theory is decidable with the high SExpTime complexity. More efficient 
algorithms should be designed for particular fragments of TCTL extended with 
parameters. The extension to dense timed models should be investigated. 

The paper is organized as follows. In Section 2, we introduce parametric timed 
automata and parametric TCTL logic. In Section 3, we show the undecidability 
of the model-checking problem and we solve the problem algorithmically when 
equality is forbidden in the logic. The proofs of two important propositions of 
Section 3 are postponed to Section 4. A complete version of the article is available 
at http : //www . ulb . ac . be/di/ ssd/ cf v/TechReps/TechRep_CFV_2003_14 . ps. 

2 Parameters Everywhere 

In this section, we introduce parameters in the automaton used to model the 
system as well as in the logic used to specify properties of the system. The 
automata are parametric timed automata as defined in [1] with a discrete time 
domain and one parametric clock. The logic is Parametric Timed CTL Logic, 
PTCTL for short, as defined in [4]. 

Notation 1. Let 0 be a fixed finite set of parameters 9 that are shared by the 
automaton and the logical formulae. A parameter valuation for 6? is a function v : 
0 — >■ N which assigns a natural number to each parameter 0 G 0. In the sequel, 
a,P,... mean any linear term Ei^jCiOi + c, with Cj, c G N and {0i\i G /} C 0. A 
parameter valuation v is naturally extended to linear terms by defining u(c) = c 
for any c G N. We denote by x the unique parametric clock. The same notation x 
is used for both the clock and a value of the clock. A guard g is any conjunction 
of a; ~ a with ~ G {=, <, <, >, >}. We denote by G the set of guards. Notation 
X \=y g means that x satisfies g under valuation v. We use notation E for the 
set of atomic propositions. 

In the next definition of parametric timed automaton, we make the hypothesis 
that non-parametric clocks have all been eliminated, see [1] for details. 

Definition 1. A parametric timed automaton A is a tuple {Q, E,C,I), where 
Q is a finite set of states, if C Q x {0, 1} x C/ x 2^^^ x Q is a finite set of edges, 
C : Q ^ 2^ is a labeling function and I : Q ^ G assigns an invariant I{q) G G 
to each state q. A configuration is a pair {q, x) with q a state and x a clock value. 

Whenever a parameter valuation v is given, A becomes a usual one-clock 
timed automaton denoted by A^ . We recall the next definitions for A^ . 

Definition 2. A transition (q,x) {q' ,x') between two configurations (q,x) 

and {q',x') with time increment r G {0, 1}, is allowed in A" if (1) x \=v I{q) 
and x' \=y E(q'), (2) there is an edge {q,r, g,r,q') G E such that x + t \=y 
g and x' = 0 if r = {x}, x' = x + t if r = 0 .^ A run p = {qi,Xi)i>Q is 

^ Time increment r is first added to x, guard g is then tested, and finally x is reset 
according to r. 
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a sequence of transitions (qi,Xi) ^ {qi+i^Xi+i) such that Z’i>oTi = oo.^ The 
duration t = Dp{qi,Xi) at configuration (qi,Xi) of p is equal to t = SQ<j^iTj. A 
finite run p = (q,x) ^ {q' ,x') is a finite sequence of transitions such that (q,x) 
(resp. {q',x')) is its first (resp. last) configuration. Its duration Dp is equal to 
Dp{q',x'). 

In the next definition, cr is any atomic proposition and a, (3 are linear terms. 

Definition 3. [4] A PTCTL formula / is of the form f = Qi9i ■ ■ ■ Qk9k T 
such that k > 0, {9i, . . . ,9k} Q O, Qj € {3,V} for each j, 1 < j < fc, and ip is 
given by the following grammar 

ip ::= a \ a (3 \ \ ipy ip \ 3Q) ip \ I 

Usual operators 3U and VU are obtained as 3U>o and VU>o- We use the 
abbreviations 30r^aP for TA\Jr.^aP, VOr^aP for T^Dr^^aP, 3U^Q.p for 
and for -'30 ,.,q,-i(^. Notation QF-PTCTL means the set of quantifier-free 

formulae p of PTCTL. The set of parameters of 0 that are free in /, i.e. not 
under the scope of a quantifier, is denoted by 6>/. Thus, for a QF-PTCTL formula 
p, we have 0,p = 0. We now give the semantics of PTCTL. 

Definition 4. Let A he a parametric timed automaton and {q,x) he a configu- 
ration of A. Let f = Q\9\ ■ ■ ■ Qk9k p he a PTCTL formula. Given a parameter 
valuation v on 0f, the satisfaction relation {q,x) / is defined inductively as 

follows. If f = p, then {q, x) p according to following rules: 

— (q,x) \=v a iff there exists^ a run p = {qi,Xi)i>o in A^ with (q,x) = (qo,xo) 
and a G L{q) 

— (q,x) a ^ (3 iff there exists a run p = {qi,Xi)i>o in A^ with (q,x) = 
(qo,xo) and v{a) ~ v{j3) 

— (q,x) \=v ~^p iff {q,x) p 

— {q, x) \=y p \/ Ip iff {q, x) p or {q, x) f: 

— (q,x) 30 p iff there exists a run p = {qi,Xi)i>o in A^ with (q,x) = 

(qo,Xo) and {qi,Xi) p 

— {q,x) 1 =^ p3Ur^a'f iff there exists a run p = {qi,Xi)i>o in A^ with {q,x) = 
(qo,Xo), there exists i > 0 such that Dp{qi,Xi) ^ v{a), {qi,Xi) |=^ ip and 
(qj,Xj) \=y p for all j < i 

— (q,x) \=y p'iU^a'f iff for any run p = {qi,Xi)i>o in A'" with (q,x) = (qo,xo), 
there exists i>0 such that Dp{qi,Xi) ~ v{a), {qi,Xi) \=y ip and (qj,Xj) \=v p 
for all j < i 

If f = 39 f, then (q,x) \=y f iff there exists c G N such that (q,x) |=„' /' where 
v' is defined on 0p hy v' = v on 0j and v'{9) = c. If f = 'i9f , then {q,x) |=„ / 
iff for all c G N, (q,x) !=„' /' where v' is defined on 0f hy v' = v on 0f and 
v'{9) = c. 

® Non Zenoness property. 

^ We verify the existence of a run starting in {q, x) to ensure that time can progress 
in A" from that configuration. 
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The problems that we want to solve are the following ones. The first problem 
is the model-checking problem for PTCTL formulae / with no free parameters. 
In this case, we omit the index by v in the relation (q, x) \= f since no parameter 
has to receive a valuation. The second problem is the more general problem of 
parameter synthesis for PTCTL formulae / such that 0/ is any subset of 0 . 

Problem 1 . The model- checking problem is the following. Given a parametric 
timed automaton A and a PTCTL formula / such that Of = 0, given a config- 
uration {q, x) of A, does {q, x) \= f hold? 

Problem 2 . The parameter synthesis problem is the following. Given a paramet- 
ric timed automaton A and a configuration (g, x) of A, given a PTCTL formula 
/, compute a symbolic representation of the set of parameter valuations v on 
Of such that (q,x) \=v /.® 

Example. We consider the parametric timed automaton A of Figure 1 and the 
PTCTL formulae equal to 

/ : V6»iV6»2 • {02 <01^ VD(ct -)> VO<2ei-i-2 cr)), 
g : V01 ■ (01 >5 — y Vn((T — ^ <261+2 ^))- 

Then O = {0i,02}, Of = 0 and Og = {02}- The model-checking problem “does 
(907 0) H / hold” has a yes answer. The parameter synthesis problem “for which 
parameter valuations v on Og does {qo, 0) \=y g hold” receives the answer 02 < 4 . 

3 Decision Problems 

In this section, we will show that the model-checking problem is undecidable. The 
undecidability comes from the use of equality of the operators 3 U..,q, and VU..,q,. 
When equality is forbidden in these operators, we will prove that the model- 
checking problem becomes decidable. In this case, we will also positively solve the 
parameter synthesis problem. In the sequel, we use subscripts to indicate what 
are the limitations imposed to ~ in operators 3 U.,.,q, and VU.,.,q. For instance, 
notation PTCTL^^j means that ~ can only be equality. 

Theorem 1. The model- checking problem for PTCTL{=j is undecidable. 

The proof of this theorem relies on the undecidability of Presburger arithmetic 
with divisibility, an extension of Presburger arithmetic with the integer divisi- 
bility relation zjz' meaning “z divides z'” (see [3]). More precisely, we show in 
the proof that for any sentence of Presburger arithmetic with divisibility, we 
can construct a parametric timed automaton A, a configuration (q,XQ) and a 
PTCTL formula / such that <P is true iff the answer to the model checking 
problem (q,xo) ^ / for M is yes. 

Let us turn to the fragment PTCTL{<_<^>_>}. To provide solutions to the 
model-checking problem and the parameter synthesis problem for this fragment, 

® For instance this representation could be given in a decidable logical formalism. 
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our approach is as follows. Given a formula (p of QF-PTCTL{< < >_>}, we are 
going to construct a Presburger formula Ag^^(x, 0) with free variables x and all 
9 G 0 such that 



(q,xo) \=v <P iff Aq^g,{xo,v{0)) is true 

for any valuation w on 6* and any value xq of the clock (see Theorem 2) . Solutions 
to Problems 1 and 2 will be obtained as a corollary (see Corollaries 1 and 2) since 
Presburger arithmetic is decidable. Considering the model-checking problem for 
instance, if Q0 a PTCTL formula / with no free parameters, then to test if 
(( 7 ,xo) H / is equivalent to test if the Presburger sentence Q0 Z\q,^(xo, 6>) is 
TRUE. 

Example. Consider the parametric timed automaton of Figure 1 and the QF- 
PTCTL formula (p equal to Vn(cr — >• \/0<g^a). Then 0 = { 6 >i, 02 , Presburger 
formula Aqg^g,(x,0) is equal to 0 i -h 02 + 2 < 03 with no reference to x since it 
is reset along the edge from qq to qi. Thus (q,Xo) \=y p for any clock value Xq 
and any valuation v such that v{6i) + u( 02 ) + 2 < v{9s). The model-checking 
problem {q, xq) \= V 0 iV 02303 (p has a yes answer for any xq because the sentence 
V 0 iV 02303 • (01 -I- 02 + 2 < 0s) is TRUE in Presburger arithmetic. If x was not 
reset along the edge from go to gi, then the formula Aqq^q,{x,0) would be equal 
to (01 -I- 02 + 2 < 03 ) A (x < 0i) and the above model-checking problem would 
have a yes answer iff V0iV02303 • (0i -|- 02 -I- 2 < 9 , 3 ) A (xq < 0i), that is xq = 0. 

As indicated by this example, the Presburger formula Aq^^(x, 0) constructed 
from the QF-PTCTL formula is a boolean combination of terms of the form 
0 ~ a or cc ~ a where 0 is a parameter, x is the clock and a is a linear 
term over parameters. Formula Aq^q,(x,0) must be seen as a syntactic trans- 
lation of formula p to Presburger arithmetic. The question “does (g, a^o) 1= / 
hold” with / = Q0 p is translated into the question “is the Presburger sen- 
tence Q0 Aq^q,{xo, 0) true”. At this point only, semantic inconsistencies inside 
Q0 Aq^^{x(j, 0) are looked for to check if this sentence is true or not. 

Our proofs require to work with a set Q of guards that is more general than 
in Notation 1. 

Notation 2. Linear terms a,f3,... are any SiCi9i + c, with Ci,c G Z (instead of 
N). Comparison symbol ~ used in expressions like x ^ a and a ^ f3 belongs to 
the extended set {=, <, <, >, >, =„,<, =a,>}- For any constant a G N+, notation 

0 =a,< z' means z = z' mod a and 2 : < z' . Equivalently, this means that there 
exists y G N such that z + ay = z' . Notation z =a,> z' means z = z' mod a 
and z > z'. Any a: ~ a is called an x-atom, any a ~ /3 is called a 9-atom. 
An x-conjunction is any conjunction of x-atoms, and a 9-conjunction is any 
conjunction of 0-atoms. We denote by Bx,e the set of boolean combinations of 
x-atoms and 0-atoms. A guard is any element of Bx, 0 . Thus the set G of Notation 

1 is now equal to the set Bx, 0 .^ 

® The more general notion of gnard is explicitely used in Proposition 5. This propo- 
sition is the basic tool of Propositions 1 and 2 on which our main Theorem 2 is 
based. 
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From now on, it is supposed that the guards and the invariants appearing in 
parametric timed automata belong to the generalized set G = Bx^e- It should 
be noted that the extension of ~ to {=,<,<,>,>, =o,<, =a,>} is only valid 
inside automata, and not inside PTCTL formulae. We shortly call automaton 
any parametric timed automaton A. Let us state our main result. 

Theorem 2. Let A he an automaton and q he a state of A. Let ip he a QF- 
PTCTL{< < >_>} formula. Then there exists a Bx,e formula Ag^^(x,0) with 
free variables x and all 9 G 0 such that 

{q,Xo)\=yif iff v(6>)) is TRUE 

for any valuation v on 0 and any clock value xq. The construction of formula 
is effective. 

The proof of Theorem 2 is based on the next two propositions. Their proof 
is postponed till Section 4. 

Proposition 1. Let A he an automaton and q be a state. Then there exists a 
Bx ,0 formula Rung(a:, 0) such that for any valuation v and any clock value xq, 
RuUg(a;o, u(6>)) is true iff there exists an infinite run in A" starting with {q, xq). 
The construction ofAmiq{x,0) is effective. 

Proposition 2. Let A be an automaton and q, q' he two states. Let ~ G {<, <, > 
, >} and a he a linear term. Then there exists a Bx,e formula Duration^“/ (x, 0) 
such that for any valuation v and any clock value Xg, Duration^“/(a;o, f (0)) is 
TRUE iff there exists a finite run p = {q, xq) ^ {q', •) in A'" with Dp ~ v{a). The 
construction o/Duration^“/(x, 0) is effective. 

Proof, (of Theorem 2) The proof is by induction on the way formula ip is con- 
structed. We only explain two cases. 

Let us treat the case (p = 3Q) tp. Recall that {q,xo) \=v 3 Q '0 iff there 
exists a transition {q,xg) A (q’^Xq) such that {q',XQ) \=y ip and (g',XQ) is the 
first configuration of an infinite run p' . Let {q,r, g,r,q') be the edge of E that 
has lead to the transition (g, xq) A {q' ,x'g). Then (see Definition 2), Xq = 0 if 
r = {x}, and Xq = xq -I- r if r = 0. By induction hypothesis, has been 

constructed such that Aqp.^{x'g,v{0)) is true iff (g',Xg) \=y ip. The automaton 
A is modified into an automaton A as follows. A copy”^ q' of q' is added to Q 
such that Cfq') = T(q'), Tfq') = T(q') A Agpp{x,0). A copy fq' ,t' , g' ,r' ,p) is 
also added for any edge {q' ,t' ,r' , g' ,p) leaving q' . By Proposition 1 applied to 
A and q', we get a Bx,g formula Run^' such that Rung/(xg, x(0)) is true iff 
there exists an infinite run in A starting with fqfxg). By construction of q', 
equivalently there exists an infinite run in A" starting with {qfx'g) and such 
that (g',Xg) \=v ip. Hence, the expected formula Aq^p(x,0) is equal to 

Aq,p{x, 0) = V(q.r.s,r.g')GiS„ (Ad) ^ Run;j/(0, 0)) 

V y{g,r,g.r,q')eE\En A Run^, (x -k T, 0)) 

where Er is the set of edges that reset the clock. 

^ The copy q' of q' is needed to focus on the first configuration (q',Xo) of p'. 
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Let us turn to the case (/? = We have (q,xo) |=w iff either (1) 

0 ^ v(a:), (q,xo) \=v 4> (<7 , Xq) is the first configuration of an infinite run, 

or (2) there exists a finite run p = (q,Xo) (q',XQ) such that Dp ~ u(a;), 4’ is 
satisfied at any configuration of p distinct from (g',Xg), (j) is satisfied at {q' , x'q) 
and (g',Xg) is the first configuration of an infinite run. For any state p € Q, 
formulae and L\p,0 have been constructed by induction hypothesis. Case 

(1) is solved as done before for operator 3Q. Case (2) is more involved. The 
automaton A is first modified into A as for operator 33 (with q' , (p instead of 
q' ,'p) to get formula Run^/ such that Rung/(xo, x(0)) is true iff there exists 
an infinite run in A~" starting with (q',XQ) and such that (q',XQ) \=v p. The 
automaton A is then modified in another automaton A in the following way. A 
copy q' of q' is added to Q as well as a copy of any edge of E entering q' as 
entering q'-, we define £(<?') = E{q') and I{q') = I{q') A Runq'(x, 6>).® For any 
state p of Q, I{p) is replaced by I{p) A Ap^p(x,0). Thanks to Proposition 2 
applied to A, we obtain a formula Duration^", (x, 6>) expressing the following: 
Duration|^“,(xo, w(0)) is true iff there exists in AP a finite run p = ((?,xo) 
(g'jXg) with Dp ~ v{a). Equivalently it expresses (2). For case (2), the expected 
formula is thus the disjunction \/^,gQ Duration^", (x, 6>). 

Solutions to Problems 1 and 2 are easily obtained from Theorem 2 because 
any 3^,0 formula is a Presburger formula. 

Corollary 1. The model-checking problem for y is decidable. 

Corollary 2. Let A be an automaton and (<7, xg) be a configuration of A. Let 
{9i, . . . , 6k} C 0 with fc > 0 and let f = QiOi ■ ■ ■ Qkdk p be a PTCTL{< <_> >j. 
formula. Then the Presburger formula QiOi ■■■ Qk6k Ag,p{xo,0) with free 
parameters in 0f is an effective characterization of the set of valuations v on 
0f such that ((j, Xg) |=„ /. 

Let us denote by V{A, f,q,xo) the set of valuations u on 6>/ such that 
(q,xo) 1=1, /. Let 0f be equal to {0(,... ,0(}. Presburger arithmetics has an 
effective quantifier elimination, by adding to the operations -I- and < all the con- 
gruences = moda, a G N+. It follows that the characterization of V{A, f,q,xo) 
given above by Qidi ••• Qk6k ^g,<p(x,6>) can be effectively rewritten without 
any quantifier. On the other hand, since Presburger arithmetic has a decidable 
theory, any question formulated in this logic about V (A, /, q, xg) is decidable. For 
instance, the question “Is the set V{A, f,q,Xo) non empty” is decidable as it is 
formulated in Presburger arithmetic by 39} • • • 39[ Qi9\ ■ ■ ■ Qk9k Aq^g,{x, 0). 
The question “Does the set V{A, f,q,xo) contain all the valuations on 6>/” is 
also decidable as it can be formulated as • • • Qk9k Aq^g,{x, 0). 

The question “Is the set V{A,f,q,xo) finite” is translated into 3z V0( ••• 

QkOk ■{Aq^p(x,0) ^ K9}<z). 

® The copy q' of q' is needed to focus on the last configuration {q',x'ff) of p; the 
augmented invariant is needed to express that p is satisfied at {q' , x'f) and (g', x'q) is 
the first configuration of an infinite run. 
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4 Durations 

The aim of this section is a sketch of proof of Propositions 1 and 2. This will 
be achieved thanks to a precise description of the possible durations of finite 
runs in an automaton. Several steps are necessary for this purpose, they are 
detailed in the next subsections. In these subsections, we make the hypothesis 
that the automata are normalized, that is for (i) the guards labeling the edges 
and used in the invariants are limited to conjunctions of x-atoms and 0-atoms 
with ~ G {=,<,>,=a,<,=a,>}, (a) for any state q € Q, the edges (p,T,g,r,q) 
entering q are all labeled by the same g and the same r (however r can vary). 
This normalization allows a simplified presentation of the proofs. 

Proposition 3. Any automaton can he effectively normalized. 



4.1 Durations in Reset-Pree Automata 

In this subsection, we restrict to reset-free normalized automata A, that is au- 
tomata in which there is no reset of the clock. For this family of automata, we 
study the set TZ{A",xo) of finite runs of the form {i, a;o) (/, •) such that i £ I, 
f £ F, where I, F are two fixed subsets of states, xq is a fixed clock value and 
V is a given valuation. 

As A is normalized and reset-free, given a state q, all edges {p,r, g,r,q) 
entering q have the same guard g and satisfy r = 0. It follows that we can 
move guard g from these edges to the invariant F{q) of q, by simply erasing g 
on the edges and adding g as a, conjunction to F{q). In other words, a reset-free 
normalized automaton has its edges labeled by time increment r G {0,1} and 
its states labeled by invariants which are conjunctions of x-atoms and 0-atoms. 

What we do first is a sequence of transformations on A that preserve 
TZ{A" , Xq). The aim of these transformations is to simplify the form of the invari- 
ants used in the automaton in the following sense : (i) the invariant F{q) of any 
state q£ Q\{IVJF) is the conjunction of at most one x-atom (necessarily of the 
form X = a) and one 0-conjunction, additional x-atoms of the form x > f3 (resp. 
X < j3) are allowed in I{q) ii q £ I (resp. q £ F), (ii) for any run p G TZ{A'", xq), 
for any x-atom x = a, there exists at most one configuration (q',x') of p such 
that I{q') contains x = a. 

Proposition 4. Any reset-free normalized automaton can he effectively simpli- 
fied. 

The transformations necessary to the proof of this proposition are based on 
standard constructions of automata theory. As the automaton is normalized, 
any invariant is a conjunction of x-atoms and 0-atoms. Given such an invariant 
F{q), x-atoms of the form x =a,< ex (resp. x =a,> a) are first eliminated, then 
x-atoms X > a (resp. x < a) are eliminated except if q £ I (resp. q £ F), finally 
x-atoms X = a are treated. This sequence of transformations is possible because 
the automaton is reset-free. For instance, any x-atom of the form x =a,< ex 
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belonging to T{q) can be eliminated thanks to the following idea, li a = b mod a 
for some & G {0, 1, . . . , a — 1}, then 

X =a,< ce iff X = b mod a and x < a. 

The automaton is thus transformed in a way to compute modulo a and the 
x-atom X =a,< a is eliminated at the cost of a new x-atom x < a. 

Thanks to Proposition 4, we are now able to construct a Presburger for- 
mula describing all the possible durations of runs in Ti.{A^ ,X[j) in terms of the 
parameters. We need the next notation. 

Notation 3. Let t be a variable used to denote a duration and x be a variable for 
a clock value. We call t-atom any t^aort^a — x. A t-atom is of first type if 
it is of the form 

t — Oi 5 t cXj t — cx X or t ot Xj 
it is of second type if it is of the form t < a — x. 

Proposition 5. Let A be a simplified automaton. Then there exists a Pres- 
burger formula X(f,x,0) such that for any valuation v and any clock value Xq, 
A(to, Xq, v(0)) is TRUE iff there exists a run in TZ{A",Xo) with duration to. This 
formula is a disjunction of formulae of the form 

At A A< A Ax A Ag, 

where A* is a first type t-atom, A< is a conjunction oft-atoms of second type, Ax 
is an x-conjunction and Ag is a 9 -conjunction. Its construction is effective. 



01 >02 




Fig. 2. A reset-free normalized automaton which is simplified 



Let us explain this proposition on an example. 

Example. Consider the simplified automaton A of Figure 2 with I = {i}, F = 
{/}• We denote by to the duration of any run (i, xq) (/, •) in TZ{A'’ , xo) , where 
V is a fixed parameter valuation. Every run has to pass through state q with T{q) 
equal to x = 0i. Let us study the possible durations t\ of runs p\ = (i,xo) 

{q, •). Each duration t\ must be equal to v{9i) — xq. For runs pi using the cycle, 
constraint > w(02) holds and ti has the form m > 0. The unique run 

Pi not using the cycle is not constrained and its duration equals t\ = 2. Now any 
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duration to can be decomposed as to = ti + 2n + 1 = v{9i) — xq + 2n + 1, n > 0. 
Due to the x-atom x < 02 of I{f), we get another constraint Xo + to < In 

summary, we have 

[(w(6»i) - xo =i,> 3 A v{0i) > v{02)) V v{0i) - xo = 2] 

A [to = 2 ,> v{0i) - Xo + 1] 

A [xo + to < v{02)\ 

We get the next Presburger formula A(t, x, 0) 

[(x =i,< 0\ — 3 A > ^ 2 ) V X = — 2] 

A [t = 2 ,> 01 + 1 — x] 

A \t < 02 — x] 

such that there exists a run in TZ{A'’,Xo) with duration to iff A(to, Xo, ?^(6*)) is 
TRUE. This formula is in the form of Proposition 5 when it is rewritten as a 
disjunction of conjunctions of t-atoms, x-atoms and 0-atoms. 

Thanks to this example, we can give some ideas of the proof of Proposition 
5. For any state <; G Q \ (/ U F) of a simplified automaton, the invariant X{q) 
contains at most one x-atom which is of the form x = a. The proof is by induction 
on these x-atoms. Given an x-atom x = a contained in some X(q), any run p 
in TZ{A'’,Xo) passing through state q can be decomposed as (i,Xo) {<t,xi) 
and (g,xi) (/, X 2 ). Its duration to can also be decomposed as t\ + t 2 with 
the constraint that the clock value xq -I- ti must satisfy x = a. It follows that 
to = x(q;) — Xo -I- ^ 2 - The durations ti and t 2 and the related constraints can be 
computed by induction. When there is no x-atom in the automaton (base case), 
only 0-atoms can appear in invariants. Runs are therefore partitioned according 
to the set of 0-atoms that constrain them. Their durations can be described as 
fixed values or arithmetic progressions. 



4.2 Durations in General 

In this subsection, A = {Q, E, C,I) is any automaton as in Definition I (the 
clock can be reset). We fix two states q, q' , a parameter valuation v and a clock 
value Xq. Let us study the set 'TZq^q'{A^ ^Xo) of runs p = (g, Xq) (g^•) in A^ . 

As this automaton can be supposed normalized, the edges (p, t, g, r, q) enter- 
ing a given state q all have the same r. We call q a reset-state in case r = {x}. A 
run p in TZq^q^A^ ^xo) possibly contains some reset-states. It thus decomposes as 
a sequence of fc > I runs Pi, f < i < k, such that each pi contains no reset-state, 
except possibly for the first and the last configurations of pi. The total duration 
Dp is equal to the sum A7i<i<feDp^. 

Using standard constructions in automata theory, it is not difficult to isolate 
runs like the runs Pi, f < i < k. For any couple of reset-states p and p' , we 
can construct from A a reset-free automaton Ap^p> whose runs from p to p' use 
no intermediate reset-state. Therefore thanks to Proposition 5, the durations 
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of these runs can be described by a Presburger formula which is a disjunction 
\J ^ \P>P o of formulae 



XP,P',J = A A A \P/’\ 

For each couple p,p' and each j, we associate a distinct letter bp^p'j to each 
formula \P’P . The set of all these letters is denoted by B. The letter bpyj 
symbolically represents formula \P’P and thus IJ^- bp^p^j symbolically represents 
durations of the runs of Ap,p' going from p to p' . 

Next, we can construct another automaton from in a way to show how a 
run p of TZq^q'{A^ ,Xq) decomposes into a sequence of runs pi according to reset- 
states as described above. This is a classical automaton B over the alphabet B 
whose states are q, q' and the reset-states of A. An edge from a reset-state p to a 
reset-state p' which is labeled by letter bp^p'j symbolically represents durations 
of some runs of Apy going from p to p' . Therefore the set of durations of runs 
of TZqy{A^,Xo) is symbolically represented by the rational subset Lq q> of words 
of B* that are the labels of the paths of B going from q to q' . 

Thanks to the symbolic representation of TZqy{A'^, xq) by the rational subset 
Lqy and because our logic is restricted to PTCTL{<^<_>^>}, it is possible to 
prove Propositions 1 and 2. It should be noted that the set TZqy{A'^, xq) cannot 
be symbolically described by a Presburger formula as in Proposition 5, otherwise 
the model-checking problem for PTCTL would be decidable (see Theorem 1). 
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Abstract. In this paper we give two equivalent characterizations of the 
Caucal hierarchy, a hierarchy of infinite graphs with a decidable monadic 
second-order (MSO) theory. It is obtained by iterating the graph trans- 
formations of unfolding and inverse rational mapping. The first charac- 
terization sticks to this hierarchical approach, replacing the language- 
theoretic operation of a rational mapping by an MSO-transduction and 
the unfolding by the treegraph operation. The second characterization 
is non-iterative. We show that the family of graphs of the Caucal hier- 
archy coincides with the family of graphs obtained as the e-closure of 
configuration graphs of higher-order pushdown automata. 

While the different characterizations of the graph family show their ro- 
bustness and thus also their importance, the characterization in terms 
of higher-order pushdown automata additionally yields that the graph 
hierarchy is indeed strict. 



1 Introduction 

Classes of finitely generated infinite graphs enjoying a decidable theory are a 
strong subject of current research. Interest arises from applications in model 
checking of infinite structures (e.g. transition systems, unfoldings of Kripke struc- 
tures) as well as from a theoretical point of view since the border to undecid- 
ability is very close and even for very regular structures many properties become 
undecidable. 

We are interested in a hierarchy of infinite graphs with a decidable monadic 
second-order (MSO) theory which was introduced by D. Caucal in [7]. Starting 
from the class of finite graphs two operations preserving the decidability of the 
MSO-theory are applied, the unfolding [9] and inverse rational mappings [6]. It- 
erating these operations we obtain the hierarchy (Craph(n))„gN where Graph(n) 
is the class of all graphs which can be obtained from some finite graph by an 
n-fold iteration of unfolding followed by an inverse rational mapping. This hi- 
erarchy of infinite graphs contains several interesting families of graphs (see [7, 
17]) and has already been subject to further studies [2]. 
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The first level contains exactly the prefix-recognizable graphs [6], which can 
in turn be characterized as the graphs definable in A 2 (the infinite binary tree) by 
an MSO-transduction [1] , or as the e-closure of configuration graphs of pushdown 
automata [16] (see [1] for an overview). We extend these characterizations to 
higher levels. 

In Sect. 3 we show that the iteration of the treegraph operation, a variant 
of the tree-iteration [18], and MSO-transductions generates exactly the Cau- 
cal hierarchy. These two operations are to our knowledge the strongest graph 
transformations which preserve the decidability of the MSO-theory. In [17] the 
hierarchy was defined starting from the infinite binary tree by iterating MSO- 
interpretations (particular MSO-transductions) and unfolding. Since the unfold- 
ing is definable inside the graph obtained by the treegraph operation, it follows 
from our result these definitions are indeed equivalent. 

Pushdown automata can also be seen as the first level of a hierarchy of higher- 
order pushdown automata, whose stack entries are not only single letters (as for 
level 1), but words (level 2), words of words (level 3) .... Similar hierarchies 
have already been considered in [14,11,15]. 

In Sect. 4 we show that a graph is on level n of the Caucal hierarchy iff it is 
the e-closure of a configuration graph of a higher-order pushdown automaton of 
level n. This result is incorrectly attributed to [2,7] in [17]. In [2], in the context of 
game simulation, only the easier direction from higher-order pushdown graphs to 
graphs in the hierarchy is shown. All the proofs in Sections 3 and 4 are effective. 

In Sect. 5 we use the characterization of the hierarchy in terms of higher-order 
pushdown automata to show that it is strict. Moreover we exhibit a generator 
for every level, i.e. every graph on this level can be obtained from the generator 
by applying a rational marking and an inverse rational mapping, or an MSO- 
interpretation. Finally we give an example of graph with a decidable MSO-theory 
which is not in the hierarchy. 



2 Preliminaries 

2.1 Operations on Graphs and the Caucal Hierarchy 

We fix a countable set A, also called alphabet. Let A C A be a finite subset of 
edge labels. A S-labeled graph G is a tuple (V^, (if^)aei:) where is a set of 
vertices and for a G A we denote by C x the set of a-labeled edges 
of G. We assume that V'^ is at most countable, and that there are no isolated 
vertices in G, i.e. for every v G there exists an w G such that {v, w) G 
or (w,v) G for some a G A. If the graph G and the set of edge labels A 
is clear from the context we drop the superscript ^ and speak just of a labeled 
graph. A graph is called deterministic if (v, w) G and (v, w') G Ea implies 
w = w' for all V, w,w' € V and a G A. 

A path from a vertex m to a vertex v labeled by ru = ai . . . a„_i is a sequence 
v\ai . . .an-\Vn G V{EV)* such that vi = u, Vn = v and (vi,Vi+i) G Ea^ for 
every i G {1, ■ • ■ ,n — 1}. In this case we will also write u —> v. A tree T is a 
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graph containing a vertex r called the root such that for any vertex v G there 
exists a unique path from r to v. 

The unfolding Unf(G, r) of a graph G = , (Ea)aes) from a node r G 

is the tree T = , (Ef^)aes) where is the set of all paths starting from r 

in G and for all a € E, {w, w') G Eff iS w' = w ■ a ■ v tor some v G . 

The treegraph Treegraph(G, tl) of a graph G = (V,{Ea)aes) by a sym- 
bol tt ^ A" is the graph G' = {V'^ ,{E'^)a^so{^}) where V~^ designates the 
set of all finite non-empty sequences of elements of V, for all a G and all 
w G V*, (wu,wv) G E'^ iff (u,v) G Ea, and E!^ = {{wu,wuu) \ w G V* , u G V}. 
The tree-iteration as defined in [18] also contains a son-relation given by 
{{w,wu) \w & V* and m G t^}. If G is connected then the son-relation can be 
defined in the treegraph. 

Let if be a set of symbols disjoint from but in bijection with E. We extend 
every i7-labeled graph G to a (If U If) -labeled graph G by adding reverse edges 
Eg. '■= {(m,u) I {v,u) G Eg}. Let T C A be a set of edge labels. A rational 
mapping is a mapping h : T — >■ P(IfUlf)* which associates to every symbol 
from r a regular subset of (If U If)*. If h{a) is finite for every a G E we also 
speak of a finite mapping. We apply a rational mapping /i to a If-labeled graph 
G by the inverse to obtain a T-labeled graph h~^{G) with {u, v) G Eg iff there is 
a path from u to w in G labeled by a word in h{b). The set of vertices of h~^{G) 
is given implicitly by the edge relations. We also speak of h~^{G) as the graph 
obtained from G by the inverse rational mapping h~^. 

The marking A4n(G,X) of a graph G = (V,{Ea)aes) on a set of vertices 
If C y by a symbol [[ ^ If is the graph G' = {V', (Lfa)oei:u{#}) where V' = 
{(x,0) I X G y}U{(x, 1) I X G X}, E'g = {((x,0),(y,0)) | (x,y) G Eg} for 
a G E, and E^ = {((x, 0), (a;, 1)) | x G X}. A rational marking of a graph 
G = {V, (Ea)aes) by a symbol ft ^ If with a rational subset R over If U If from 
a vertex r G V" is the graph M^{G,{x G y | r x, w G i?}). An MSO- 
marking of a graph G by a symbol D with an MSO-formula iplyx) is the graph 
M^{G,{vGV<^\G^g,{v)}). 

Following [7], we define Graph(O) to be the class containing for every finite 
subset If C A all finite If-labeled graphs, and for all n > 0 

Tree(n-|- 1) := {Unf(G, r) | G G Graph(n), r G , 

Graph(n -|- 1) := {fi“^(T) | T G Tree(n -I- 1), h~^ an inverse rational mapping} , 

where we do not distinguish between isomorphic graphs. 

2.2 Monadic Second-Order Logic and Transductions 

We define the monadic second-order logic over If-labeled graphs as usual, (see 
e.g. [13]), i.e. we view a graph as a relational structure over the signature con- 
sisting of the binary relation symbols {Ea)a<^s- 

A formula fp{Xi , . . . , Xk) containing at most the free variables Xi, . . . , Xk 
is evaluated in (G, V) where G = {V,{Ea)g^s) is a If-labeled graph and V : 
y — >■ V{{1 , . . . , k}) is a function which assigns to every vertex u of G a set V(f) 
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such that V £ Xi iS i £ V(t;). We write (G, V) \= '^{Xi, . . . ,Xk), or equivalently 
G ^ <f\Vi , ... , Vfc] where Vi := {v £ V \ i £ V(u)}, if ip holds in G under the 
given valuation V. 

An MS 0-interpretation of G in A is a family X = (ipa{x,y))aer of MSO- 
formulas over X. Applying an MSO-interpretation X = {(pa{x,y))aer of X in 
A to a A-labeled graph G we obtain a G-labeled graph X{G) where the edge 
relation is given by the pairs of vertices for which Lpa{x, y) is satisfied in G, 

and is given implicitly as the set of all vertices occurring in the relations 

X(G) 

Ej^^ . Note that the addition of an MSO-formula <5(a;) to X defining the vertex 
set explicitly does not increase the power of an interpretation if we require that 
there are no isolated vertices in the resulting graph. 

Interpretations cannot increase the size of a structure. To overcome this weak- 
ness the notion of a transduction was introduced, cf. [8]. Let G = (V, {Ea)a&s) 
be a A-labeled graph and AT be a finite subset of A disjoint from A. A K-eopying 
operation for A associates to G a (A U AT)-labeled graph G' = {V , ) 

where V' = V U (V x K), E'^ := Ea for a £ E, and := {(u, {v, &)) | u G V} for 
b £ K. An MSO-transduction T = {K,X) from A to A is a A-copying operation 
for A followed by an MSO-interpretation I of A in A U A. 

Note that an inverse rational mapping is a special case of an MSO-interpre- 
tation and an MSO-marking is a special case of an MSO-transduction. 

2.3 Higher-Order Pushdown Automata 

We follow the definition of [15]. Let A be a finite set of stack symbols. A level 1 
pushdown stack over A is a word w £ E* in reversed order, i.e. if ru = ai . . . Om 
the corresponding stack is denoted by [um, • . • ,oi]. For n > 2 a level n push- 
down stack over A is inductively defined as a sequence [s^, . . . , si] of level n — 1 
pushdown stacks for 1 < i < r. [e] denotes the empty level 1 stack, the empty 
level n stack, denoted by [e]", is a stack which contains for 1 < i < n only a 
single empty level i stack. 

The following instructions can be executed on a level 1 stack [am , ... ,aij: 
push“([a™, . . . , oi]) := [a, am, • • • , ai] for every a£ E 

popi([am,am_i . . . ,oi]) := [om-i,-.. , ai] 

Furthermore we define the following function which does not change the content 
of a stack: 



top([e]) := e and top([om, . . . , aij) := a™ for m > 1. 



For a stack [s^, . 


. , Si] of level n > 2 we define the following instructions 


push“([sr, 


..,si]) 


= [push“(sr),Sr-i,. 


. . Si] for every a £ E 


push„([sr, 


..,si]) 


— [^r, Sr, ■ • • , Si] 




pushfc([sr, 


..,si]) 


= [pUShj,(Sr),S^_l,. 


. . , si] for 2 < fc < n 


POP„([Sr, 


..,m]) 


= [s,._i, . . . , Si] 




POPfc([Sr, 


..,si]) 


= [pOPfc(Sr-),Sr-l,.. 


. , si] for 1 < A: < n 
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and extend top to a level n stack by setting top([sr-, . . . , si]) := top(sr). 

We denote by Instr„ the set of instructions that can be applied to a level n 
stack (without the top function) . For the sake of easiness we also add an identity 
function denoted by — which does not change the stack at all. 

The instruction push“ adds the symbol a to the topmost level 1 stack, while 
push;, duplicates the topmost level k—1 stack completely. Similarly pop^ removes 
the top symbol of the topmost level 1 stack, while pop^, for 1 < fc < n removes 
the corresponding level k—1 stack completely. Note that the instruction pop^, 
for 2 < fc < n can only be applied if the resulting stack is again a level n stack, 
i.e. it does not remove a bottom level k—1 stack. 

A higher-order pushdown automaton of level n is a tuple A = (Q, S, F, go, 2\) 
where Q is a finite set of states, S is an input alphabet, T is a stack alphabet, 
go G Q is an initial state, and Z\ C Q x (if U {e}) x (T U {e}) x Q x Instr„ is 
a transition relation. A configuration of xl is a pair (g, [sr, • ■ • , si]) where g is a 
state of A and [s^., ... , Si] is a stack of level n. The initial configuration (go, [e]") 
consists of the initial state go and the empty level n stack [e]". A can reach a 
configuration (g', [s(./, . . . , s^]) from (g, [sj., . . . , si]) by reading a G 27 U {e} if 
there is a transition (g, a, top([sr-, ... , si]), g', z) € A such that z([sr, • ■ • , si]) = 
[s(.,, ... , s'l]. The automaton A accepts a word w € F* if A reaches from the 
initial configuration the empty level n stack after reading w. We denote by 
HOPDA(n) the class of all higher-order pushdown automata of level n. 

3 Closure Properties 

In this part, we prove that the hierarchy is closed under MSO-transductions and 
the treegraph operation. We first consider the case of deterministic trees. 

3.1 The Deterministic Case 

We consider a sub-hierarchy obtained by unfolding only deterministic graphs. 
Graphd(O) is equal to Graph(O). Treed(n-|- 1) contains the unfoldings of every 
deterministic graph G £ Graphd(n) from a vertex in . Graphd(n) is defined in 
the same way as Graph(n). Note that Graphd(n) also contains non-deterministic 
graphs. 

Closure under MSO-transductions. Using results from [3], we prove that 
for all n G N, Graphd(n) is closed under MSO-transductions. This result was 
obtained for the first level by A. Blumensath in [1]. Obviously, Treed (n) is not 
closed under MSO-transductions but if we consider only MSO-markings, we 
obtain also a closure property for Treed (n). 

Proposition 1. For all n > 0, all tree T £ Treed(n) and all graph G £ 
Graphd(n), we have that: 

1. Ai{F) also belongs to Treed(n), for any MSO-marking Ai, 

2. T{G) also belongs to Graphd(n), for any MS 0-transduction T. 




The Caucal Hierarchy of Infinite Graphs 117 



Proof (sketch): These results are proved by induction on the level using partial 
commutation results of MSO-transductions and unfolding obtained in [3] . 

1. For every deterministic graph G and every MSO-marking M, there ex- 
ists an MSO-transduction T' and a vertex r' such that Al(Unf(G, r)) « 
Unf(T'(G),r'). 

2. For every deterministic graph G and every MSO-transduction T, there exists 
an MSO-transduction T', a rational mapping h and a vertex r' such that 
T(Unf(G,r)) « /i-i(Unf(T'(G),r')). 

Note that in both cases T' preserves determinism. □ 



Closure Under the Treegraph Operation. The unfolding is a particular case 
of the treegraph operation in the sense that for any graph G the unfolding from 
a definable vertex r, Unf(G, r), can be obtained by an MSO-interpretation from 
Treegraph(G, H) (see [9]). In the case of deterministic trees, we show a converse 
result: how to obtain treegraph using MSO-interpretations and unfolding. This 
construction is due to T. Colcombet. 

Lemma 1. For any finite set of labels S, there exist two finite mappings hi,h 2 
and a rational marking Ai such that for any deterministic tree T with root r: 

Treegraph(T, tt) « (M (Unf (hf^{T),r))) . 



Proof (sketch): The finite mapping hi adds backward edges labeled by elements 
of S and a loop labeled by jj to every vertex. Thus, for all a G if, ft - 1 is defined 
by fti(a) = {a}, fti(a) = {a} and fti(jl) = {e}. 

Let H be the deterministic tree equal to Unf (ft("^(r),r), every node x of 
H is uniquely characterized by a word in (E U If U {jj})*. The rational marking 
Al$ marks all the vertices corresponding to a word which does not contain 
XX or XX for cc G If. Finally, ft 2 is used to erase unmarked vertices and to 
reverse the remaining edges with labels in If. ft 2 is given by ft 2 (jl) = {tt} and 
ft 2 (a) = for a G If. □ 

Figure 1 illustrates the construction above on the semi-infinite line. The filled 
dots represent the vertices marked by Al$. The closure of the deterministic 
hierarchy under the treegraph operation is obtained from Lem. 1 and Prop. 1, 
using the fact that for all trees T and all rational mappings ft which do not 
contain (t, Treegraph(ft“^(T), jJ) = ftj'^(Treegraph(T, jJ)) where ft-n designates the 
rational mapping that extends ft with ft(j(tt) = |jl}. 

Proposition 2. For all n > 0, if G £ Graph^(n) then Treegraph(G, jl) G 
Graphs (n -|- 1). 
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Fig. 1. The semi-infinite line after applying hi and its unfolding 



3.2 Deterministic Trees Are Enough 

We now prove that for all n, Graph(n) is equal to Graphd(n). From the technical 
point of view, this means that even if the hierarchy contains non-deterministic 
graphs and even graphs of infinite out-degree, we can always work with an ’’un- 
derlying” deterministic tree. 

Lemma 2. For all n > 0, if G G Graph(n), then there exists a deterministic 
tree T G Treed (n) and a rational mapping h such that G = h~^(T). 



Proof (sketch): The proof proceeds by induction on the level n. Let T G 
Tree(n + 1), we want to prove that T belongs to Graphd(n + !)• By defini- 
tion of Tree (n -I- 1) and by induction hypothesis, we have T « Unf(/i“^(Td), s) 
for some deterministic tree Td G Treed (n) and some rational mapping h. Using 
the fact that the unfolding can be defined in the treegraph operation (see [9]), we 
have T « T(Treegraph(/i“^(Td), U)) for some MSO-transduction T. If h^ denotes 
the rational mapping obtained by extending h with /ij(tt) = {jj}, we have T = 
T(/ij"^(Treegraph(Td, H)). Applying Lem. 1, we have T = T'\uni{hf^{Td),r)) 
where T' = M o hf^ o hf^^ oT- It is easy to check that Unf (^hf^{Td),r) belongs 
to Treed(n -I- I). Using Prop. 1, we prove that T belongs to Graphd(n -I- 1). The 
case of G G Graph(n -I- 1) is easily derived from this. □ 

We can now prove that every graph of the hierarchy has a decidable MSO- 
theory. Note that this does not follow directly from the definition because un- 
folding from an arbitrary (i.e. not necessarily MSO-definable) vertex does not 
preserve the decidability of MSO-logic. However, using Lem. 2 we can always 
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come back to the case where we unfold from a MSO-definable vertex (see [4] for 
more details). 

Theorem 1. Each graph of the hierarchy has decidable MSO-theory and this 
remains true if we add to MSO-logic the predicates |X| < oo , \X\ = k mod p 
for all k and p G N which are interpreted as X is finite respectively X has size 
equal to k modulo p for k,p G N. 

Combining Prop. 1, Prop. 2 and Lem. 2, we now have two equivalent char- 
acterizations of the hierarchy: one “minimal” in terms of unfolding and inverse 
rational mappings and one “maximal” in terms of the treegraph operation and 
MSO-transductions. The maximal characterization shows the robustness of the 
hierarchy and its interest because it combines the two, to our knowledge, most 
powerful MSO-preserving operations. On the other side, the minimal charac- 
terization allows us to make the link between the hierarchy and the graphs of 
higher-order pushdown automata. 

Theorem 2. The Caucal hierarchy is equal to the hierarchy obtained by iterating 
the treegraph operation and MSO-transductions. 

4 Higher-Order Pushdown Graphs vs. Caucal Graphs 

In this section we give an automata-theoretic characterization of the classes of 
the Caucal hierarchy. This characterization provides us with a “flat” model for 
describing a graph of any level, i.e. we do not have to refer to a sequence of 
operations. Furthermore it extends the characterization of the first level of the 
hierarchy as the £-closure of configuration graphs of pushdown automata given 
in [16] to any level. We recall some definitions. 

The configuration graph C(A) of .4 G HOPDA(n) is the graph of all config- 
urations of A reachable from the initial configuration, with an edge labeled by 
a G A U {e:} from (q,s) to {q',s') iff there is a transition {q,a,top{s),q' ,i) G A 
such that i{s) = s'. 

Let C(A) be the configuration graph of A G HOPDA(n). We will assume for 
the remainder of the paper that for every pair (g, a) of state q and top stack 
symbol a only £-transitions or only non-£-transitions are possible. The e-closure 
of C{A) is the graph G obtained from C{A) by removing all vertices with only 
outgoing £-transitions and adding an a-labeled edge between v and w if there is 
an o-labeled path from u to ru in C{A). 

A higher-order pushdown graph G of level n is the £-closure of the configu- 
ration graph of some A G HOPDA(n). We call G the higher-order pushdown 
graph generated by A and denote by HOPDG(n) the class of all higher-order 
pushdown graphs of level n (up to isomorphism) . 

This notion of £-closure was used in [16] to show that the class HOPDG(l) 
coincides with the class of prefix recognizable graphs, i.e. with the the graphs 
on the first level of the hierarchy. We extend this result to every level of the 
hierarchy. 
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The easier part of the equivalence is to show that every HOPDG of level 
n is a graph on the same level of the hierarchy. The main idea is to find a 
graph in Graph(n) such that every node of this graph can be identified with 
a configuration of a higher-order pushdown automaton, and to construct an 
inverse rational mapping which generates the edges of the configuration graph 
of the automaton. Such a construction is already contained in [2] in a slightly 
different setting. We propose here to use the family Z\” of graphs obtained by 
an (n — l)-fold application of the treegraph operation to the infinite m-ary tree 
Am- This has the advantage that there is almost a one-to-one correspondence 
between configurations of the higher-order pushdown automaton and the vertices 
of the graph. Using the fact that Am G Graph(n) we obtain: 

Lemma 3. If G € HOPDG(n) then G G Graph(n). 

We now turn to the converse direction: every graph G on level n of the hier- 
archy is indeed the e-closure of a configuration graph of a higher-order pushdown 
automaton of level n. We show this using the following two Lemmas. 

Lemma 4. If G £ HOPDG(n) and r £ U®, then Unf (G,r) £ HOPDG(n-|- 1). 

Lemma 5. If G £ HOPDG(n), r £ and h is a rational mapping, then 
/r-i (Unf (G,r)) G HOPDG(n -h 1). 

While the proof of Lem. 4 consists of a straightforward modification of the 
HOPDA for G, the proof of Lem. 5 requires some technical preparation. We 
need to show that for an automaton as constructed in the proof of Lem. 4 there 
exists a higher-order pushdown automaton which generates exactly the graph 
Unf(G, r) extended by reverse edges, i.e. for all v,w £ Unf(G, r), w — > ic in 
Unf(G, r) iff w V in the extended graph. 

To show that such an automaton exists we introduce the notion of a weak 
popping higher-order pushdown automaton. A weak popping automaton is only 
allowed to execute a pop instruction of level j > 2 if the two top level j stacks 
coincide. We skip a formal definition of a weak popping higher-order pushdown 
automaton and just mention that even though this automaton model is equipped 
with a built-in test on the equality of two stacks of the same level, it is equivalent 
to the usual model. All proofs are given in the full version of this article [4] . 

Theorem 3. For every n G N, G G HOPDG(n) iff G £ Graph(n). 

5 More Properties of the Caucal Hierarchy 

We give a generator for each level of the hierarchy. Then we use the traces of 
the graphs of the hierarchy to prove its strictness and to exhibit a graph having 
a decidable MSO-theory which is not in the hierarchy. 
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5.1 Generators 

For the first level of the hierarchy, the infinite binary tree is a generator for 
rational markings (without backward edges) from the root and inverse rational 
mappings. As hinted by the proof of Lem. 3, a similar result can be obtained 
at any level. Recall that is the graph obtained from Z \2 by an (n — l)-fold 
application of the treegraph operation. 

Proposition 3. Every graph G G Graph(n) can he obtained from Atf by ap- 
plying a rational marking (with backward edges) from its source and an inverse 
rational mapping. 

5.2 On Traces — The Strictness of the Hierarchy 

A direct consequence of Theo. 3 is that the traces of the graphs of level n are 
recognized by a higher-order pushdown automaton of level n. These families of 
languages have been studied by W. Damm and form the Ol-hierarchy [11]. The 
equivalence between the Ol-hierarchy and the traces of higher-order pushdown 
automata is proved in [12]. In [10,14], this hierarchy is proved to be strict at 
each level. 

Theorem 4. For all n > 1, 

(a) for all T G Tree(n) the branch language ofT (i.e. the set of all words labeling 
a path from the root to a leaf) is recognized by a HOPE A of level n — 1. 

(b) for all G G Graph(n) and u,v G Vq, C{u,v,G) = {w G E* \ u ^ v} is 
recognized by a HOPDA of level n. 

According to Theo. 4, the strictness level-by-level of the Ol-hierarchy implies 
the strictness of the Gaucal hierarchy. An obvious example of a graph which is at 
level n but not at level n — 1 is the generator Af. To obtain more natural graphs 
that separate the hierarchy, we consider the trees associated to monotonically 
increasing mappings / : N — >■ N. The tree 7/ associated to / is defined by the 
following set of edges: Ea = {((i, 0), (1 -1-1,0)) \ i G N} and Af, = {((i, j), {i,j -\- 
1)) I 1 G N and j -I- 1 < /(*)}. The branch language of 7/ is I n > 0}. 

Using a property of rational indexes of fc-OI languages (see [10]), we obtain 
the following proposition. 

Proposition 4. If n G N} zs recognized by a higher-order pushdown 

automaton of level k then f G 0(2f^~^(p(n))) for some polynomial p where 
2f°(n) = n and 2 (n) = 

Let us consider the mapping expk{n) = 2f^(n). It has been proved in [7] 
that Texpk belongs to Graph(fc -|- 1). Note that using Theo. 3 the construction 
given in [7] can be avoided by providing a deterministic higher-order pushdown 
automaton of level A: -I- 1 that recognizes jn G N}. 

It is natural to consider the “diagonal” mapping expui{n) = ea;p„(l). Figure 2 
shows an initial segment of the tree associated to exp^;. By Prop. 4, the associated 
tree Texp^ is not in the hierarchy. However, using techniques from [5], we can 
prove that Texp^ has a decidable MSO-theory. 
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Fig. 2. The graph Texp^ of the function expui 



Proposition 5. There exists a graph with a decidable MSO-theory which is not 
in the Caucal hierarchy. 

6 Conclusion 

We have given two characterizations of the Caucal hierarchy. We have shown that 
it coincides with the hierarchy obtained by alternating the treegraph operation 
and MSO-transductions, and thus have partly answered a question posed in 
[17]. It remains open whether one can extend this result to structures other than 
graphs, i.e. with symbols of higher arity. 

We have also characterized the Caucal hierarchy as the e-closure of configu- 
ration graphs of higher-order pushdown automata and have used this result to 
obtain that the hierarchy is indeed strict, but does not contain all graphs with 
a decidable MSO-theory. 

Despite these characterization results we know surprisingly few about the 
graphs obtained on level n > 2. This deserves further study. Also a thorough 
comparison with other methods to generate infinite graphs with a decidable 
theory misses (see [17] for a more precise account on this). 

Futhermore we like to mention that neither the constructions used to build 
the hierarchy nor Proposition 5 contradicts Seese’s conjecture that every infinite 
graph (or every set of finite graphs) having a decidable MSO-theory is the image 
of a tree (or a set of trees) under an MSO-transduction. 

Finally, many of the questions posed in [7] on the corresponding hierarchy of 
trees remained unsolved so far. 
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Abstract. We present an NP decision procedure for the formal analysis 
of protocols in presence of modular exponentiation with products allowed 
in exponents. The number of factors that may appear in the products is 
unlimited. We illustrate that our model is powerful enough to uncover 
known attacks on the A-GDH.2 protocol suite. 



1 Introduction 

Most automatic analysis techniques for security protocols take as a simplifying 
hypothesis that the cryptographic algorithms are perfect: One needs the decryp- 
tion key to extract the plaintext from the ciphertext, and also, a ciphertext can 
be generated only with the appropriate key and message (no collision). Under 
these assumptions and given a bound on the number of protocol sessions, the 
insecurity problem is decidable (see, e.g., [1]). However, it is an open question 
whether this result remains valid when the intruder model is extended to take 
into account even simple properties of product or exponentiation operators, such 
as those of RSA and Diffie-Hellman Exponentiation. This question is important 
since many security flaws are the consequence of these properties and many 
protocols are based on these operators (see, e.g., [9]). 

Only recently the perfect encryption assumption for protocol analysis has 
been slightly relaxed. In [7], unification algorithms are designed for handling 
properties of Diffie-Hellman cryptographic systems. Although these results are 
useful, they do not solve the more general insecurity problem. In [5,6], decid- 
ability of security has been proved for protocols that employ exclusive or. When 
the XOR operator is replaced by an abelian group operator, decidability is men- 
tioned as an open problem by [8]. In [8], there is a reduction from the insecurity 
problem to solving quadratic equation systems. However, the satisfiability of 
these systems is in general undecidable. Hence, this approach fails to solve the 
initial insecurity problem. 

* This work was partially supported by PROCOPE and 1ST AVISPA. The second 
author was also supported by the DFG. 
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In this paper, using non-trivial extensions of our technique in [5], we show 
that the insecurity problem for protocols that use Diffie-Hellman exponentiation 
with products in exponents is NP-complete. Our model is powerful enough to 
uncover attacks on the A-GDH.2 protocol suite discovered in [9]. 

Boreale and Buscemi [3] have addressed a similar problem. However, in their 
paper among other restrictions they put an a priori bound on the number of 
factors that may occur in products. In the present paper we allow an unlimited 
number of factors. Also, Boreale and Buscemi do not provide a complexity re- 
sult. Diffie-Hellman exponentiation is also studied by Millen and Shmatikov [8]. 
Similar to our work, Millen and Shmatikov assume that products only occur in 
exponents. However, they do not provide a decision procedure. Also, they assume 
the base in exponentiations to be a fixed constant. This is a severe restriction 
since in general this rules out realistic attacks, even in case the protocol under 
consideration assumes a fixed basis. 

Structure of the paper. In Section 2, we introduce our protocol and intruder 
model. The decidability result is presented in Section 3, including the description 
of the NP decision algorithm, and an overview of the proof. Also, we point out 
the main differences to our proof presented in [5] for XOR. A very brief sketch 
of the main new part of the proof compared to the case for XOR is provided in 
Section 4. To illustrate our model, in Section 5 we formally specify the A-GDH.2 
protocol and present an attack on it discovered by Pereira and Quisquater. 

Full proofs of all results presented here can be found in our technical report [4] . 

2 The Protocol and Intruder Model 

The protocol and intruder model we describe here extends standard models for 
automatic analysis of security protocols in two respects. First, messages can be 
build using the operator Exp{-,-), which stands for Diffie-Hellman exponenti- 
ation, and a product operator for multiplication in an abelian group. Sec- 
ond, in addition to the standard Dolev-Yao intruder capabilities, the intruder 
is equipped with the ability to perform Diffie-Hellman exponentiation. In what 
follows, we provide a formal definition of our model by defining terms, messages, 
protocols, the intruder, and attacks. 

2.1 Terms and Messages 

The set of terms term is defined by the following grammar: 

term ::= A | V | {term, term) \ {term}^ \ Exp{term, product) 

product ::= term^ \ term^ ■ product 

where A is a finite set of constants {atomic messages), containing principal 
names, nonces, keys, and the constants 1 and secret; /C is a subset of A denoting 
the set of public and private keys; V is a finite set of variables; and Z is the set 
of integers, the product exponents. We assume that there is a bijection on 
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/C which maps every public (private) key k to its corresponding private (public) 
key k~^. The binary symbols (•,•), {•}^*, and {•}? are called pairing, symmetric 
encryption, and asymmetric encryption, respectively. Note that a symmetric key 
can be any term and that for asymmetric encryption only atomic keys (namely, 
public and private keys from 1C) are allowed. The product operator models 
multiplication in an abelian group, e.g., a subgroup G of order q of the multiplica- 
tive group Z* where p and q are prime numbers (as in the A-GDH.2 protocol). 
Therefore, terms and products are read modulo commutativity and associativity 
of the product operator as well as the identity t^ = t, and are written without 
parentheses. For instance, if a, b, and c are atomic messages, then a? ■ ■ c~^ 

and a ■ b ■ a ■ b ■ c~^ ■ b ■ c~^ are considerd the same products where c~^ is the 
inverse of c. The operator Exp{-, •) stands for Diffie-Hellman exponentiation. For 
instance, Exp{a, • c~^) is a raised to the power of 6^ • c~^. Note that products 
only occur in exponents. 

If t,t\, . . . ,tn are terms with n > 2, then we call a product of the form E 
for some z yf 1 or a product of the form t^i ■ ■ • t^ a non-standard term where 
the z, Z\, . . . , Zn are integers. By abuse of terminology, we refer to a term or a 
product as a “term” . We say standard term to distinguish a term from a non- 
standard term. Note that in a product of the form t^ ■ ■ ■ , the terms ti are 

standard terms. 

Variables are denoted by x,y,..., terms are denoted by s,t,u,v, finite sets 
of terms are written E, F, ..., and decorations thereof, respectively. For a term t 
and a set of terms E, we refer to the set of variables occurring in t and E by 
V{t) and V{E), respectively. For some set S, we denote the cardinality of S by 
Card (S'). 

A ground term (also called message) is a term without variables. We use 
the expressions standard and non-standard messages as in the case of terms. 
A (ground) substitution is a mapping from V into the set of standard (ground) 
terms. The application of a substitution ct to a term t (a set of terms E) is 
written ta (Ea), and is defined as expected. 

The set of subterms of a term t, denoted by S{t), is defined as follows: 

— litcAovtcV, then S{t) = {t}. 

— If t = {u, v), {m}^, or {m}p, then S{t) = {t} 115(^)11 S{v). 

— If t = Exp{u, t)^ ■ ■ ■ tp’’), then S{t) = {t} U S{u) U IJ^ S{ti). 

— If t = tl^ ■■ ■ t)/’, then S{t) = {t} U (J- S{ti). 

We define S{E) = (J{S(t) 1 1 G E}. Note that Exp{a,b“^ ■ A) and b"^ ■ A ■ d} are 
not subterms of Exp{a, b'^ - A ■ d^). We define the set Sext{t) of extended subterms 
of t to be Sext{t) = S(t) U {M \ Exp{u, M) G S{t)}. For instance, b'^ ■ A ■ d^ is 
an extended subterm of Exp{a, b“^ ■ ■ d^), but 6^ • is not. 

We define the DAG size \t\dag of t to be Card(5ea;t(t)). Note that the DAG 
size does not take into account the size needed to represent the product expo- 
nents occurring in a term. We define the product exponent size \t\exp of t to be 
"e5ext(i) -I- . . . -I- |zn|) where \zi\ is the number of bits needed to repre- 
sent the integer Zi in binary. Finally, we define the size ||t|| of t to be \t\dag + \t\exp- 
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For a set of terms E the size is defined in the same way (replace t by if in the 
above definitions). Let Sext{<^) = {t\t & Sext{(^{x)),x G V} for a substitution a. 
We define \a\dag '■= \Sext{o-)\dag to be the DAG size of a, \a\^xp ■= \Sext{o-)\exp 
to be the product exponent size of cr, and ||ct|| := \o\dag + \(^\exp to be the size 
of a. 

We now formulate the algebraic properties of terms. Besides commutativity 
and associativity of the product operator we consider the following properties: 

t^ =t t-l = t 

fO = 1 = 1 

G ■ G' = 

Exp{t, 1 ) = t Exp{Exp{t,t'),t”) = Exp{t,t' ■ t") 



where z and z' are integers. A normal form T of a term t is obtained by ex- 
haustively applying these identities from left to right. Note that ? is uniquely 
determined up to commutativity and associativity of the product operator. Two 
terms t and t' are equivalent if ? = '~t^. The notion of normal form extends in 
the obvious way to sets of terms and substitutions. 

Let us illustrate the notion of a normal form by some examples: If a, b,c,d G 
A, then 

— '~(a^ • b^) ■ b~^ = of ■ b~^, 

— '~Exp{Exp{a, b^ ■ A),c~^ ■ d~‘^)' = Exp{a, b ■ and 

— Exp {Exp {a, b^ ■ = a. 

The following lemma summarizes basic properties of normal forms. 

Lemma 1. For every term t,t' , and substitution a the following holds true: 

1- ti'Uag < \t\dag, 
ti'Uxp < \t\exp, 

3. H?|| < ||t||, and 

, r, u rrri u r,i — n rrrii — n 

4 . ter = t a = t a = t a . 



2.2 Protocols 

The following definition is explained below. 

Definition 1. A protocol rule is of the form S where R and S are standard 
terms. 

A protocol P is a tuple {{Ri ^ Si, i & 21 }, <x,E) where E is a finite nor- 
malized set of standard messages with 1 G E, the initial intruder knowledge, I 
is a finite (index) set, <% is a partial ordering on I, and Ri Si, for every 
i Gl, is a protocol rule such that 

1. the (standard) terms Ri and Si are normalized, 

2. for all X G V(S'i), there exists j <x i such that x G y(Rj), and 
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3. for every subterm Exp{ti,t 2 ^ ■ ■ ■ ^n”) of Ri, there exists r G {1, . . . , n} such 

that V{ti) C Uj^^iV{Rj) for every I G {1, . . . ,n} \ {r}. 

A bijective mapping tt : I' — >■ is called execution ordering for P if 

I' QI, k is the cardinality ofl', and for all i,j we have that if i <x j and 7r(j) 
is defined, then 7r(i) is defined and 7r(z) < 7r(j). We define the size of n to be k. 

Given a protocol P, in the following we will assume that A is the set of constants 
occurring in P. We define S{P) := S{EU{J^i^j^{RiUSi)) to be the set of subterms 
of P and V := V{P) to be the set of variables occurring in P. The set of extended 
subterms Sext{P), the DAG size \P\dag, the product exponent size \P\exp, and 
the size ||P|| of P are defined as expected. 

Intuitively, when executing a rule Ri Si and on receiving a (normalized) 
message m in a protocol run, it is first checked whether m and Ri match, i.e., 
whether there exists a ground substitution cr such that '~rn' = '~RiCp. If so, '~SiO^ 
is returned as output. We always assume that the messages exchanged between 
principals (and the intruder) are normalized — therefore, m is assumed to be 
normalized and the output of the above rule is not Sia but 0^(7 . This is be- 
cause principals and the intruder cannot distinguish between equivalent terms, 
and therefore, they may only work on normalized terms (representing the cor- 
responding equivalence class of terms). Finally, we note that since the different 
protocol rules may share variables, the substitution of some of the variables in 
Ri and Si may be known already. 

Condition 1. in the above definition is not a real restriction since due to 
Lemma 1, the transformation performed by a protocol rule and its normalized 
variant coincide. 

Condition 2. guarantees that when with Si an output is produced, the substi- 
tution for all variables in Si are determined beforehand. Otherwise, the output 
of a protocol rule would be arbitrary since if in Si a variables occurs for the first 
time, it could be mapped to any message. 

Condition 3. guarantees that every single protocol rule can be carried 
out deterministically. For instance, this singles out protocol rules of the form 
Exp{g, x-y) ^ Exp{g, x-y-b) in case neither x nor y are known beforehand (i.e., 
the substitution of neither x nor y is determined by a previous application of a 
rule). Otherwise, when say the principal receives Exp{a, a - b), there are different 
ways of matching this term with Exp{a,x ■ y). In our technical report [4], we 
explain why condition 3. does not rule out deterministic protocols. In a nutshell, 
such protocols can be turned into equivalent protocols (w.r.t. attacks possible) 
with rules that can be carried out deterministically. Let us illustrate the trans- 
formation by a simple example protocol with one principal A performing two 
steps Al and A 2 : 

Al : Exp{g,x ■ y) ^ hello 

A 2 : y^Exp{g,x) 

This protocol does not satisfy condition 3. because neither x nor y are known 
when executing Ai. However, although Ai cannot be performed deterministi- 
cally, overall the protocol is deterministic since in A 2 principal A gets to know 
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y, and thus, can determine Exp{g,x). This protocol can be turned into the fol- 
lowing protocol: 

Ai : z ^ hello 

A 2 : y ^ Exp{z,y-^) 

Every rule in this description of A can be carried out deterministically without 
having to wait for messages expected in subsequent steps. In particular, condition 
3. is met. This kind of transformation, which in some cases might require to 
introduce auxiliary steps, can be applied for every deterministic protocol. Since 
the transformations are linear, our NP-completeness result for the insecurity of 
protocols is preserved. 

We also point out that the transformations yield more realistic descriptions of 
protocols in that the actual operations carried out by the principals are formal- 
ized. The improved description of A illustrates this. In the original description 
of A, one implicitly assumes that A can check whether the message expected in 
Ai is of the form Exp{g, x ■ y), i.e., is obtained by exponentiation. However, this 
is unrealistic since in practice A merely obtains a bit string (which in the second 
description is z) from which she cannot tell how it was created. The original de- 
scription of A might therefore exclude possible attacks. We note that the second 
description of A requires that the exponentiation base may be any message (in 
the example some message substituted for z). This more realistic version of A 
could therefore not be formulated when a fixed base is assumed, as is the case 
in [8]. 



2.3 The Intruder Model and Attacks 

Our intruder model is based on the Dolev-Yao intruder commonly employed in 
automatic protocol analysis. That is, the intruder has complete control over the 
network and can derive new messages from his knowledge by composition, de- 
composition, encryption, and decryption of messages. In addition to the standard 
Dolev-Yao intruder we equip the intruder with the ability to perform exponen- 
tiation (see 4. below). 

Given a finite normalized set E of messages, the (infinite) set forge{E) of mes- 
sages the intruder can derive from E is the smallest set satisfying the following 
conditions:^ 

1. ECforge(E). 

2. If (m,m') G forge(E), then m G forge{E) and m! G forge(E). Conversely, if 
m,m' G forge{E), then (m, m') G forge{E). 

3. If {m}^, G forge(E) and m' G forge(E), then m G forge(E). Conversely, 

if TO, to' G forge{E), then G forge(E). Analogously for asymmetric 

encryption. 

4. If TO, TOi, . . . , TO„ G forge{E), then '~Exp(jn, to^^ • • • , m^)' G forge{E). 

^ In [4] we use an equivalent definition of forge{E) based on derivations. 
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We call the intruder with these capabilities to derive messages the DH intruder 
where DH stands for Diffie-Hellman. Note that if if is a set of normalized mes- 
sages, then so is forge{E). 

We now define attacks. In an attack on a protocol P, the intruder (nonde- 
terministically) chooses some execution order for P and then tries to produce 
input messages for the protocol rules. These input messages are derived from 
the intruder’s initial knowledge and the output messages produced by executing 
the protocol rules. The aim of the intruder is to derive the message secret. 

Definition 2. Let P = ({i?' S'' | j G So) he a protocol. Then an 

attack on P is a tuple (tt, a) where tt is an execution ordering on P and a is a 
normalized ground substitution of the variables occurring in P such that 

1. G forge(~So, Sicr, ..., Si_i<T~') for every i G {1, . . . , fc} where k is the size 
of IT, Ri := R'^-i(iy and Si := S'^_iyy and 

2. secret G forge{'~So, Sicr, ..., Sfetr'). 

Due to Lemma 1, it does not matter whether, in the above definition, cr is normal- 
ized or not. Note that forge{ Sq, Sia, Si-ia ) = forge{ So, Sia, Si-\a ) 
since from normalized messages the intruder only derives normalized messages. 
The decision problem we are interested in is the following set of protocols: 

Insecure := {P \ there exists an attack on P}. 

Later we will consider minimal attacks. 

Definition 3. Let P = {{Ri ^ Si, i £ T},<x,So) be a protocol. An attack 
(tt, cr) is minimal if SxevW{x)\dag is minimal. 

Clearly, if there is an attack, there is a (not necessarily uniquely determined) 
minimal attack. 



3 Main Theorem and the NP Decision Algorithm 

We now state the main theorem of this paper. 

Theorem 1. For the DH intruder, the problem Insecure is NP-complete. 

NP-hardness can easily be established (see for instance [1]). 

In Figure 1, an NP decision procedure for Insecure is presented where p 
is a polynomial.^ It guesses some tuple (tt, cr), steps 1. and 2., and then checks 
whether (tt, cr) is an attack on P, steps 3. and 4. 

^ In the technical report we have more fine grained measures for the size of terms and 
substitutions and more precise bounds. The exact definition of p follows from the 
proof and is omitted here. 
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Input: protocol P = {{Rj ^ Sj, j G T}, <i, So) 

1. Guess an execution ordering tt for P. Let k, FU, and Si be defined as in 
Dehnition 2. 

2. Guess a normalized ground substitution o such that ||(j|| < p(||P||). 

3. Test that '~Ri(P £ forge{'~{Sja | j < i} U {So}”') for every i G {1, . . . , k}. 

4. Test secret £ forge(~{Sja | j < fe + 1} U {So}”'). 

5. If each test is successful, then answer “yes”, and otherwise, “no”. 

Fig. 1. An NP Decision Procedure for Insecure 

Obviously this procedure is sound. It remains to show completeness and it re- 
mains to show that the procedure can in fact be carried out in non-deterministic 
polynomial time. To show the latter, we consider the decision problem 

DERIVE := {{E,m) \ m £ forge{E)} 

where E is a finite set of normalized standard messages and m is a normalized 
standard message (both given as DAG). In [4], we show: 

Proposition 1. For the DH intruder, DERIVE can he decided in deterministic 
polynomial time. 

Given that the size of a is polynomially bounded in the size of P, we can conclude 
from the above proposition that the tests in 3. and 4. of the algorithm can be 
performed in deterministic polynomial time in the size of P. Thus, the procedure 
presented in Figure 1 is an NP decision procedure. The basic idea behind the 
proof of Proposition 1 is to show that i) to derive a message m from E the 
intruder only needs to derive subterms of E and m in the intermediate steps, 
and that ii) every derivation step can be carried out in polynomial time. 

It remains to show completeness of the procedure in Figure 1. This requires 
to show that if there exists an attack on P, then there exists an attack (tt, a) on 
P such that ||cr|| < p(||P||). This is done in two steps: We prove that 

1. \<^\dag < p' {\P\dag) iov some polynomial p' , and 

2. |cr|ea;p < p"(||P||) for some polynomial p" . 

The proof of 1 . follows the same lines as the one in [5] where protocols with XOR 
were considered. However, due to the richer algebraic properties of Diflie-Hellman 
exponentiation operator, the proof is more involved. Roughly speaking, the main 
idea of the proof is to show that for a minimal attack (tt, a ) on the protocol P 
if the top symbol of cr(a;) is an atom, a pair, or encryption, then there exits a 
t £ S{P) such that '~tcP = a{x). Also, if a{x) is of the form Exp{to,tl^ . . .t^"), 
then for every ti there exists a, t G S{P) such that '~t(P = a(x). In other words, 
(parts of) cr are build from subterms of P. As an immediate consequence, the 
number n of factors U, i = 1, . . . ,n, in products can be bounded in the DAG 
size of P. More generally, we can prove that the DAG size of cr can be bounded 
by a polynomial in the DAG size of P (see [4] for details). Note that this does 
not bound the size of the Zi. 





132 



Y. Chevalier et al. 



For XOR, to bound the size of a, only 1. needed to be shown since the 
number of subterms of a bounds the size of XOR terms occurring in a due to 
the nilpotence of XOR: An XOR term can be considered a product of the form 
where the Zi are either 0 or 1, and hence, there is no need to bound 
the ZiS. For the terms considered here this is no longer true, and as mentioned, 
from 1. we cannot derive a bound on the size of the product exponents Zj. Thus, 
deciding Insecure in presence of Diffie-Hellman exponentiation requires new 
techniques. The main idea behind the proof of 2. is presented in Section 4. 

4 Bounding the Product Exponent Size of Attacks 

In order to bound \cr\exp, we will associate with a minimal attack (tt, ct) on a 
protocol P a substitution and a (solvable) linear equation system S such 
that coincides with a except that the product exponents in a are replaced 
by new (integer) variables and such that for every solution f3 of £, the tuple 
(tTjCt'), where a' is obtained by replacing in cr^ the variables according to (3, is 
an attack on P. Since the size of the linear equation system can be bounded 
polynomially in the size of P, and thus, the size of the minimal solutions of this 
equation system can be bounded polynomially, we obtain an attack with product 
exponents polynomially bounded in the size of P (see Proposition 2). 

We first need to define messages that may have linear expressions as product 
exponents. 

Definition 4. Let Z be a set of variables. The set M = M{Z) of open messages 
over Z, the set V = V{Z) of open products over Z, and the set Cexp = £-exp{Z) 
of linear expressions over Z are defined by the following grammar: 

Mr.= A\ {M,M} \{M}%\{Mr^\Exp{M,P) 

-p ::= I M^^^P ■ V 

£exp '■'■= Z\Z\ Cexp + Cexp \ Z-Cexp 

The size |e| of a linear expression e is the number of characters needed to rep- 
resent e where integers are encoded in binary. The set of (extended) subterms, 
the DAG size, and the product exponent size of open messages and products is 
defined analogously to terms. 

We call a mapping (3 \ Z ^ Z dm evaluation mapping. The evaluation /3(e) € 
Z of a linear expression e w.r.t. [3 is defined as expected. The evaluation mapping 
/3 extends in the obvious way to open messages, open products, sets of open 
messages, substitutions, etc. A linear equation system £ (over Z) is a finite set 
of equations of the form e = e' where e and e' are linear expressions over Z, e.g., 
e = 3 • z — 5 • zb The size \£\ of £ is |e| + |e'|. An evaluation mapping (3 

is a solution of £ {(3 \= E) if /3(e) = /3(e') for every equation e = e' G £. 

The key for bounding the product exponent size of attacks is the following 
lemma. 

Lemma 2. Let t,ti, ■ ■ ■ ,tn be open messages and f3 be an evaluation mapping 
such that 73(t)^ G forgef j3{tiy , . . . , f3{t„y). Then, there exists an extension of 
j3 and an equation system £ such that 
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1 . 

2. e forge{'~P'{ti)', for every P' |= £, and 

3. the size of £ is polynomially bounded in ||ti, . . . 

The proof of this lemma is quite involved. The basic idea is to replace the 
messages in a derivation D from , '~P{tnj' to '~P{t)' by open messages 

which coincide with the messages in D except for the product exponents. More 
precisely, one first turns the ti into open messages t', which we call /3-normal 
forms, such that Pft'f) = '~p{tij'. Now, one simulates the derivation £> in a (sym- 
bolic) derivation D' starting with the /3-normal forms t'l, ... The intermedi- 
ate terms obtained in D' are /3-normal forms of the corresponding terms in D. 
The equation system £ evolves in the process of turning the ti into /3-normal 
forms and simulating D. 

Using this lemma, it is now rather easy to prove: 

Proposition 2. There exists a polynomial p" such that for every protocol P, if 
(tt, a) is a minimal attack on P, then there exists an attack (tt, u') on P such that 
(7 and a' coincide up to the product exponents and such that \<j'\exp < p"(ll-P||)- 

Note that together with the polynomially bound on the DAG size of cr (Sec- 
tion 3), this proposition implies that the size ||cr|| of a can polynomially be 
bounded in the size of P. This establishes completeness of the algorithm in 
Figure 1. 

The proof sketch of Proposition 2 is as follows: Let (tt, a ) be a minimal attack 
on P and let be cr where all product exponents are replaced by new variables. 
Let P assign to every of these variables the corresponding product exponent, 
and thus, P{<J^) = a. Then, we apply Lemma 2 to the case where t = Ria^ and 
tj = Sja^ for every j G {0, . . . , i — 1}. (W.l.o.g. So can be considered a single 
message instead of a finite set of messages. We set = secret.) For every 

i G {0,...,fc-|- 1}, Lemma 2 yields an equation system £i such that P \= £i and 
'~P' {Ria^y G forge(~P' {Soa^y , . . . , '~P'{Si-ia^y for every /3' \= £i. Let £ be the 
union of these equation systems. With Lemma 2 and using that the DAG size 
of cr can polynomially be bounded, it is easy to conclude that the size of £ is 
polynomially bounded in the size of P. Since P \= £, we know that £ is solvable. 
By [2], there exists a solution /3' of £ such that the binary representation of the 
integers can polynomially be bounded in the size of £, and thus, polynomially 
bounded in ||P||. We define a' to be P'{cr^) and obtain an attack (7 t,ct') as 
required. 



5 The A-GDH.2 Protocol 

We refer the reader to [9] for a detailed description of the A-GDH.2 protocol. 
Let P = {1, ... ,n, 1} be the set of principals that may be involved in a run of 
a A-GDH.2 protocol where I is the name of the intruder (who can be both a 
legitimate participant and a dishonest principal). Any two principals i,j G P 
share a long-term secret key Kij{= Kj^i). In a protocol run, a group G C P of 
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principals (membership to a group may vary from one run to another) establish 
a session key that at the end of the protocol run is only known to the members of 
the group as long as all members in G are honest {implicit key authentication). 
In a run, one principal plays the role of the so-called master. Assume for example 
that A, B,C,D G P want to share a session key and that D is the master. Then, 
A sends a message to B, B sends a message to C, and C sends a message to 
the master D. Then, D computes the session key for himself and also broadcasts 
keying material to A, B, and C using the long-term secret keys shared with these 
principals who from this material can each derive the session key. We call A the 
first, B the second, and C the third member of the group. 

We now give a formal specification of the protocol in our protocol model. We 
abbreviate terms (ti, (^2 • • • {tn-i,tn) ■ ■ •)) by ti,. . . ,tn- We will define protocol 
rules nffp, which describe the /th step, I G {1, 2}, of principal p G P, in the jth 
instance of p, j > 0, acting as the ith member of the group in which p' G P is 
the master. The relation nffp, < is the only partial order relationship 

between protocol rules. By we denote a random number (an atomic message) 
generated by p in instance j, and secret^*’^ denotes a secret (some atomic message) 
of p in instance j. We define n^’fpi, i-e. the first step of p in instance j acting 
as the first member of the group (i.e., the initiator of the protocol) where p' is 
the master to be 

1 a, Exp{a, rPd) 

where a is a group generator (an atomic message), and for i > 1 we define nffp, 
to be 






Exp{x\'\r^'^), . . . , Exp{xl'_l^,rP’^),x^'\ Exp{x^A 



where the are variables. The second step Bf^p' of P i^i instance j as tth 
member, i > 0, is the protocol rule 

Note that Exp{yP ’^ , • K~p , ) is the session key computed by p and that implicit 

key authentication requires that no principal outside of the group can get hold 
of secretPo . We now define the protocol rule Mp[L.p^ which describes principal 
p G P in the jth instance acting as master for the group pi, . . . ,Ph,P G P (in 
this order) where p is the last member of that group. We define Mp[L.p^ to be 






“^h+l 



Exp{z' 

{secretPb} 



^PO 

‘1 J 



ATp,,p), . . . , Exp{zl'\rP'^ ■ Kp^,p), 



Exp{z 



where the z^.'^ are variables, Exp{z^'^ • Kp^^p) is the keying material for p^, 
and the message Exp{zl)^.^,rP'^) is the session key computed by the master p. 

The following protocol P describes two sessions of the A-GDH.2 protocol, 
one for the group p,p',/,p" G P and one for the group p,p',p" where in both 
cases p" is the master of the group. Note that in the first instance, the actions 
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of the intruder / need not be defined. Formally, the set of protocol rules in 
P consists of the rules describing p in the first session , TTf 2 pi> (note 

that < n^'2p")^ the rules for p' in the first session ^2 2V0 

and the master of the first session. The protocol rules of the second ses- 
sion are . The initial intruder knowledge 

is {a, r^’^} U {Kpi | p G P}. Let secret be some of the secrets returned by p or p' 
in the second session. Note that since the intruder is not a member of the group 
of the second session, he should not be able to obtain secret. However, as shown 
in [9], there exists an attack on P. It is easy to verify that this attack will be 
found by our decision procedure. 

6 Conclusion 

We have shown that a class of protocols with Diffie-Hellman exponents can be 
automatically aualyzed. We strougly coujecture that RSA eucryptiou cau easily 
be iutegrated iuto this framework: We have to limit the intruder to only use 
non-negative product expoueuts wheu performing the exponentiation operation 
and to allow terms of the form (which stands for t’s private key) outside 
of exponents. More generally, in future work we plan to consider the case of 
arbitrary products outside of exponents and to allow the intruder to perform 
abelian group operators outside of exponents. 
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Abstract. In this paper, we show the decidability and NP-completeness 
of the satisfiability problem for non-structural subtyping constraints in 
quasi-lattices. This problem, first introduced by Smolka in 1989, is im- 
portant for the typing of logic and functional languages. The decidability 
result is obtained by generalizing Trifonov and Smith’s algorithm over 
lattices, to the case of quasi-lattices with a complexity in 0{m^ M'"n^), 
where m (resp. M) stands for the number of minimal (resp. maximal) 
elements of the quasi-lattice, v is the number of unbounded variables and 
n is the number of constraints. Similarly, we extend Pottier’s algorithm 
for computing explicit solutions to the case of quasi-lattices. Finally we 
evoke some applications of these results to type inference in constraint 
logic programming and functional programming languages. 



1 Introduction 

The search for more and more flexible type systems for programming languages 
goes with the search for algorithms for solving subtyping constraints in more 
and more complex type structures. Type checking and type inference algorithms 
for a program basically consist in solving systems of subtyping constraints of 
the form 3A /\”^^ < t' where are types and X is the set of variables 

appearing in the system. 

In its most general form, non-structural subtyping combines subtyping and 
parametric polymorphisms and allows subtyping relations between type con- 
structors of different arities. For instance, in the type system for constraint logic 
programming TCLP [6], the subtyping relation list (a) < term allows us to see 
a (homogeneous) list as a term. In a lattice of type constructors, Trifonov and 
Smith [14] gave a simple decomposition algorithm, with a complexity in O(n^), 
for testing the satisfiability of non-structural subtyping constraints in a lattice 
of infinite or regular types. Pettier [11] extended this algorithm to compute 
solutions explicitly when they exist. However, the lattice structure of type con- 
structors imposes the existence of a minimal element T and a maximal element 
T, and thus does not treat the typing with the empty type T as an error. In the 
particular case of object types d la Abadi - Cardelli [1], where type constructors 
are defined and ordered by their invariant or covariant labels, Palsberg, Zhao 
and Jim [9] gave an O(n^) algorithm for solving subtyping constraints in this 
specific type structure. 

In this paper, we are interested in the solving of non-structural subtyping 
constraints in more general structures than lattices. We consider quasi-lattices 
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of types, that is partially ordered sets for which any non-empty subset having 
a lower bound (resp. an upper bound) has a greatest lower bound (resp. least 
upper bound). These structures allow the absence of the types T and _L. The 
decidability of non-structural subtyping constraints satisfiability in quasi-lattices 
is an open problem mentioned in Smolka’s thesis [13]. One difficulty of non 
structural subtyping is its capacity to forget arguments of parametric types. For 
example, let us consider the types list{a) representing homogeneous lists and 
nhlist representing non-homogeneous lists, with list (a) < nhlist. Let us also 
consider the following constraint system: list{nhlist) < a,list{int) < a. In a 
lattice, it is equivalent to list(T) < a. In a quasi-lattice without T element, the 
system also has a solution a = nhlist. It is thus not correct to solve the system 
in the lattice obtained by completion with T and T, and then simply check the 
absence of T and T in the bounds. 

In this paper, we bring a positive answer to the decidability problem by 
generalizing Trifonov and Smith’s algorithm to quasi-lattices, and we prove the 
NP-completeness of this problem. The rest of the paper is organized as follows. 
In the next section, we define the ordered set of (possibly infinite) types formed 
upon a quasi-lattice of type constructors of different arities, and we prove that 
this set is a quasi-lattice. In section 3, we show that in quasi-lattices, the systems 
closed by Trifonov and Smith’s decomposition rules are satisfiable, and we give 
an algorithm for testing the satisfiability of subtyping constraints with a time 
complexity in 0{m^ where m (resp. M) stands for the number of minimal 
(resp. maximal) elements of the quasi-lattice, v is the number of unbounded 
variables and n is the number of constraints. The NP-completeness of constraint 
satisfiability is shown in this section by using the result of Pratt and Tiuryn 
for n-crowns [12]. In section 4, we generalize Pottier’s algorithm for computing 
explicit solutions in quasi-lattices. Section 5 presents some applications of these 
results to type checking and we conclude in the last section. Most proofs have 
been omitted for lack of space, they are given in [5]. 

2 Types 

2.1 Preliminaries 

Let (E,<) be a partially ordered set. For a nonempty subset S of E, we note 
|S' = {a; G E\Vy € S x < y} the set of lower bounds of S and f*? = {x G E\Vy G 
S y < x} the set of upper bounds of S. For the empty set, |0 = 0 and f0 = 0- 
We note FIS' (resp. US') the greatest lower bound (resp. least upper bound) of S 
whenever it exists. A lower quasi-lattice (resp. upper quasi-lattice) is a partially 
ordered set where any finite subset having a lower (resp. upper) bound has a 
greatest lower bound (resp. a least upper bound). A quasi-lattice is an upper 
and a lower quasi-lattice. 

Definition 1 (Complete quasi-lattice). A partially ordered set is a complete 
quasi-lattice (in the sense of sets) if for all non empty subsets S C E, nS exists 
whenever and US exists whenever fS yf 0. 
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2.2 Labels 

As mentioned in the introduction, we are interested in type languages allowing 
subtyping relations between type constructors of different arities, like list (a) < 
term for instance. In general, such subtyping relations specify subtyping relations 
between specific arguments of the type constructors. For instance, by writing 
ki{a,P) < k2{P), we specify that types built with fci are subtypes of the ones 
built with k2 when the second argument of fci and the argument of k2 correspond, 
the first argument of ki being forgotten in the subtype relationship. 

From a formal point of view, it is simpler (and more general) to express the 
relationship between arguments by working with a structure of labeled terms. 
In the formalism of Pottier [10], each argument of a constructor is indicated by 
a label instead of a position. Moreover, positive and negative labels are distin- 
guished in order to express the covariance or the contravariance of arguments 
w.r.t. the subtyping relation. 

So let £+ and £~ be two disjoint countable sets of labels, we note £ = 

£+ l±l £~. Let (IC,<jc) be a complete quasi-lattice of type constructors. Let a 

be the arity function defined from /C into the finite parts of C. We denote by 
a+ (resp. a~) the function which associates the positive (resp. negative) labels 
to a constructor. We assume that there is at least one type constructor with an 
empty arity, kg. 

Definition 2. (/C, <jy, a) is o signature if: 

1. for all ki<ick2<Kk3, a{ki) ("10(^3) C a{k2)- 

2. for all S C 1C, ifnS exists, then a(nS') C 

3. for all S C 1C, ifUS exists, then a(US') C 

4- for all ki<)ck2, there exists k s.t. ki<ick<jck2 and a{k) = a(fci) ("10(^2). 

Conditions 1, 2, 3 express the coherence of labels w.r.t. the order relation and 
are similar to the ones found in [10] for lattices. Condition 4 is specific to quasi- 
lattices, its purpose is to forbid signatures like ki{a)<K.k2{fi) which do not induce 
a quasi-lattice structure for types. For example, if k^ and ki are not comparable, 
then ^(fcs) and ^2(^4) have common lower bounds, like ^1(^3) and ^1(^:4), but 
don’t have a greatest common lower bound. 

For a signature (/C, a), we note C* the set of finite strings of 

labels, e the empty string, the string concatenation and |w| the length of w. 
We are interested in (possibly infinite) types formed upon /C, where the positions 
of subterms are defined by strings of labels. 

Definition 3. Let {1C, <ic, , C~ , a) be a signature. A (possibly infinite) type 
is a partial mapping t from L* into K. such that: 

1. Its domain is prefix closed: Vic = W1.W2 € dom(t),wi € dom(t). 

2. e € dom(t). 

3. For all positions w G dom{t), for all I € C, w.l € dom(t) if and only if 

I G a{t{w)). 
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We note T{S) the set of (possibly infinite) types built upon the signature S. In 
the following, we assume a fixed signature S = {IC,<ic,C^ ,C~ ,a) and we note 
T = T{S) the set of types built upon S. We note t/w the type t' : v ^ t{w.v), 
that is the subterm occurring at position w in t. We note U /I, the set of subterms 
of types in U CT occurring at position I G £, that is U/l = {t/l \ t £ U A I £ 

Example 1. We shall use the following example of quasi-lattice of type construc- 
tors given with their labels, {/cq, ki, ^2(^1, ^2: ^3)1 k3{l2,h), ^4(^2)! ^5(^3)}, where 
£+ = {h,l2,l3}, C,~ = 0, and the subtyping relation is pictured out as follows: 

^2(^1, 12, 13) 

ko ki ^3(^2, ^3) 




^4(^2) ^5(^3) 

Definition 4. A type constructor k' £ 1C is a lower (resp. upper) bound of 
another constructor k £ 1C w.r.t. a set of labels L C C if k'<ick (resp. k<ick') 
and a{k) fl a{k') C L. 

We note f^k (resp. f^k) the set of lower (resp. upper) bounds of k w.r.t. L. In 
example 1, we have = {/C3, k^, k^} and = 0. Next, we define the 

subset of labels of k occurring in h^k: 

Definition 5. For a set of labels L C £, the subset of significant labels of L 
under (resp. over) k is the set 



SLi^k = a{k) n y a(fc') 

(resp. = a(fc) 

In example 1, we have SLfyi^ i^yk2 = {h} and SLfyi.^ i^yk2 = 0. One can 
easily check using the conditions of the definition 2 of a signature the following: 

Proposition 1. If f^k yf 0, then f^k has a maximum Uf^k and a{UlLk) = 
SEf^k. If^Lk yf 0 , then tifc has a minimum nfiA: and a(nti,fc) = SEf^k. 

2.3 Subtype Ordering 

The subtyping relation < is defined over types, as the intersection of a sequence 
(<n) of preorders over types defined by: 

— <0= T X T 

^ t <n+i t' if t{e)<]ct'{e) and for all labels I £ a{t{e)) fl a{t'{e)): 

• either I G £+ and t/l <„ t' /I 

• or I £ £~ and t'/l <„ t/l 
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Proposition 2 . < is an order over T- 



Proposition 3 . Let ti,t2 G T. t\<t2 if and only if ti{e)<tct2{e) and for all 
labels I € a(ti(e)) fl a(t2(e)): 

— either I G £+ and t\/l<t2/l 
~ or I G C~ and t2ll<tijl 

Now, our goal is to show that this ordered set of types forms a quasi-lattice. 
First we define the set of usable labels under a set of types S as the set of labels 
I such that S/I has a lower bound: 

Definition 6 . The set of usable labels under a set of types S ffT is the set 
ULfS ={IgC+ \ i{S/l) yf 0 } U {? G £- I t( 5'/0 + 0 } 

The set of usable labels above S is the set 

ULfS ={lGC+ \ t{S/l) 0 } U {? G £- I i{s/l) ^ 0 } 

For example, with the types t = /c2(fco) ^4(^0)) and t' = k^{ki,k^{k\)) 
formed over the constructors of example 1 , we have ULf{t,t'} = {^1,^2} and 
ULf{t, t'} = {h,l2, ^3}- The head constructor of greatest lower bounds and least 
upper bounds in T is given by: 

Definition 7 . For a set of types S C T, the greatest lower bound constructor 
of S is the constructor noted figs' = U 4 ,(i 7 j;,^s)(n{s(e) | s G S}), the least upper 
bound constructor of S is the constructor noted U^S = nt([/i;,ts)(Ll{s(e) | s G S}). 

Now, we define a sequence of types that approximates the greatest lower 
bound of a set of types up to a given depth. The first type of the sequence is 
an arbitrary type constant of arity 0 , which simply plays the role of a place 
holder^. 

Definition 8 . The greatest lower (resp. least upper) bound of rank n of a non 
empty set S QT of types, noted n„S (resp. U„Sj, is defined by: 

— HoS = UqS = ko 

— (n„+iS)(e) = rigS and for all labels I G a(rigS).' 

• if I G then (n„+iS)// = n„(S/l) 

• if I G C~ then (n„+iS)/^ = U„(S/Z) 

— (U„+iS)(e) = UgS and for all labels I G a(UgS).' 

• if I G then (U„_|_iS)// = Li„{S/l) 

• if I G C~ then (U„+iS)/^ = n„(S/l) 

^ In the proofs, ko is compared to other types through the relation <0 which is equal 
to T X T. This means that the type ko does not need to be a subtype or a supertype 
of any other type and that it may be replaced in the definition by any arbitrary 
type. 
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For example, let t = k4{k4{ki)) and t' = k4{k5{ki)) be two types formed 
over the constructors of example 1. We have tUot' = fcg, tUit' = ^4(^0), tU2t' = 
k4{k^{k[),k[))) and for all n > 3, = k4{k^{ki,ki)). 

This provides the following construction of the greatest lower bound and 
the least upper bound of a set of types, showing that (T, <) is a quasi-lattice 
(Theorem 1). 

Definition 9. Let Hj- : p{T) — >■ {£* — >■ JC) (resp. U-p) he a partial mapping 
defined by: 

(nrS){w) = (n„+iS')(w) (resp. {UrS){w) = (U„+iS')(u;) ) 

for all non empty sets of types S Q T, for all n G N, for all positions w € 
dom(rin+i(S)) (resp. U„+i^ such that |w| = n. 

Using the types of the previous example, we have tU-pt' = fc4(/c3(fci, /ci)). 

Theorem 1. (T, <) is a complete quasi-lattice, where Up- denotes greatest lower 
hounds and Up- denotes least upper hounds. 

3 Testing the Satisfiability of Subtyping Constraints 

Let V be a countable set of variables, noted a, ( 3 , — Types with variables are 
defined as the set, noted 7y, of (possibly infinite) types built upon the signature 
(/C U V, <!c , ,a). A subtyping constraint is a pair of finite types t\ and t2 

in 7v and is noted ti < t2. For a system C of subtyping constraints, we note 
V(C) the set of variables occurring in C. 

Definition 10. A substitution p : V ^ T satisfies the constraint t\ < t2, noted 
p \= ti < t2, if p{ti)<p{t2). The subtyping constraint ti < t2 is satisfiable if 
there exist a substitution p such that p\= t\ < ^2- 

For the sake of simplicity, we will suppose, without loss of generality, that 
the constraint systems considered contain only fiat terms. A fiat term is either 
a variable, a constant, or a term of depth 1 where all leaves are variables. For 
example, int, list{a) and a are fiat terms while list{int) is not. Clearly, given a 
constraint system, one can find an equivalent constraint system where all terms 
are fiat terms, by introducing variables for arguments of terms that are not fiat 
terms, and by introducing equality (double inequality) constraints between these 
variables and the corresponding arguments. 



3.1 Closed Systems 

We first define pre-closed systems as constraint systems where variables are 
bounded. We recall in table 1 the partial function dec used for breaking con- 
straints in Trifonov and Smith’s algorithm [14]. 
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Definition 11 (Pre-closed systems). A constraint system c is said to he 
upper pre-closed if for all variable a € V(C), there exists t ^ V such that 
t < a € C. C is said to be lower pre-closed if for all a € V(C), there exists 
t ^ V such that a < t G C . A constraint system is said to be pre-closed if it is 
upper and lower pre-closed. 



Table 1. Trifonov and Smith’s partial decomposition function dec [11,14:]. a,f3 denote 
type variables and t, ti, t 2 denote non variable types 

dec{a < 13) = {ct < P} decia < t) = {a < t} dec{t < a) = {t < a} 

if ti{e)<ict 2 {e) then : 

dec{ti<t2)= y {ti/l <t2/l}yj y {t2/l<ti/l} 

ieo+(tl(e))na+(t2(e)) iSa" (ti (e))na- (ta (e)) 



Definition 12 (Closed system). A constraint system C is closed if it is pre- 
closed and if for all constraints c G C, dec(c) is defined and included in C and 
for all {ti < a,a < ^ 2 } C C, dec{t\ < t 2 ) is defined and included in C. 

The application of the decomposition function on non variable types is used to 
enforce the presence of the corresponding inequalities between their arguments. 
For example, in a closed constraint system C, if list{a) < list{(3) G C then 
a < (3 G C. The last condition is used to enforce the transitivity of the constraint 
system. 

In the case of lattices, the substitution p such that for all variables a G V (C), 
p{a) = Up{fa) is defined and is a solution to C. However, in the case of quasi- 
lattices, the choice of the head constructor of a lowest upper bound depends on 
the existence of a lowest upper bound for each argument^, and such a solution 
can not be easily defined. For example, let us consider the type constructors list, 
nhlist and int with a{list) = {/} and list<x,nhlist, and the following pre-closed 
constraint system C = {list{(3) < a,list{5) < a, int < P < int, nhlist < S < 
nhlist}. The only solution of this system is p : a 1 — nhlist, P 1 — >■ int, 6 1 — >■ nhlist, 
in particular we have p{a){e) = nhlist n{foi} = list. Such solutions are 
constructed in the proof of theorem 2 below. 

Some technical notions are necessary. For a variable a, let I ^ ^ 

V,a <t G C} he the set of upper bounds of a in C, and let = {t \ t ^ 

V ,t < a G C} he the set lower bounds a in C. For a set of variables A, we note 

tlc^ = UaeA of types that are upper bound of an element of A in 

C, and 3}qA = UaGvi the set of types that are a lower bound of an element 

of A in C. By abuse of notation, when C is clear from the context, we will omit 
C in the notations. 



^ or a greatest lower bound in the case of contravariant arguments 
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Definition 13. Given a constraint system C, let sol : p( V (C)) xp(V (C)) —>■ 1C 
he the partial function defined by sol{A, B) = U(4,a(uD)(nC/)) when it exists, where 
U = {t{e) I t G fiB} and D = {t{e) \ t G 

Lemma 1. In a closed system C, sol{A,B) is defined for all non empty sets 
A, B such that Wa £ A, ^ a £ B, a < £ C. 

Lemma 2. Let C he a closed constraint system and A,BCV (C) verifying 
the conditions of lemma 1. For all labels I £ a(sol(A, B)), if I £ CA then 
{fyA/l,f[B /I) satisfies the condition of lemma 1, and if I £ Cr then {fiB /l,fyA/l) 
satisfies the condition of lemma 1. 

Lemma 3. Let C he a closed constraint system. Let A,B,E,F C V{C). If 
ij.A C il-E and 'fl'-f C f\B and A, B and E, F satisfy the conditions of lemma 1, 
then sol{A, B)<]csol{E , F) . 

Theorem 2. In a quasi-lattice, any closed constraint system is satisfiable. 

Given a pre-closed system C, one can compute its closure, as in Trifonov 
and Smith’s algorithm [14] in 0{n^). The algorithm proceeds by computing the 
sequence C, Cl, C2, . . . defined by: 

Cn+i = U U dec{c) U U dec{t < t') 
ceC" {t<a,a<t'}CC’‘ 

If dec is not defined for some constraint in C" then the constraint is not satisfi- 
able, that is C" has no solution. Otherwise, for any C", C"“*'^ is defined and it 
is clearly equivalent to C”. In this case, the sequence reaches a fix point which 
is closed, hence satisfiable by theorem 2, and equivalent to C, so C is satisfiable. 

Corollary 1. A pre-closed constraint system C is satisfiable in infinite types in 
and only if it is satisfiable in regular types. 

3.2 Pre-closure Algorithm 

The algorithm above requires a pre-closed system as an entry. This condition is 
automatically filled in lattices since there exists a maximal type T and a minimal 
type _L. In this case, it is sufficient to add constraints _L < a and a < T to obtain 
a pre-closed system with the same solutions [11,14] . In quasi-lattices, the theorem 
3 below provides sufficient conditions over /C for deciding the satisfiability of a 
non pre-closed constraint system. Let K. be the set of maximal elements of K. 
and 1C the set of its minimal elements. 

Theorem 3. If K. verifies the following conditions: 

1. Vfc G ^U^,a(fc) = 0 _ 

2. For all k £ 1C, there exists k\ £ 1C and k 2 £ 1C such that ki<K:k<x:k 2 . 




144 E. Coquery and F. Pages 



For any constraint system C let the set of pre-closures pc{C) be: 



pc{C) = < CD [J {ta < a,a < t'^} \ ta{e) G 4(e) G K. 
y aev(C) 

All elements in pc{C) are closed and the union of their sets of solutions is equal 
to the set of solutions of C . 

If /C and 1C are finite sets, it is possible to enumerate the elements of pc{C). 
Since these elements are pre-closed, one can test their satisfiability using the 
closure algorithm of the previous section. This gives an algorithm for testing 
the satisfiability of non-closed constraint systems in quasi-lattices with a finite 
number of extrema each with an empty arity. The time complexity of the satis- 
fiability test is in 0{n^m" IvT") where n is the size of the constraint system, m is 
the size of JC and M the size of 1C, and v is the number of unbounded variables. 

NP-completeness is shown by using a result of Pratt and Tiuryn [12] for 
n-crowns [5]. 

Theorem 4. The satisfiability problem for subtyping constraints in quasi- 
lattices with a finite number of extrema each with an empty arity is NP-complete. 

The first condition imposed on K in theorem 3 expresses that the extrema 
in the quasi-lattice of constructors have an empty arity. Without this condition, 
it is worth noting that the introduction of a new constraint ta < a (or a < t'^) 
may also introduce some new unbounded variables appearing in t, that must be 
bounded by introducing new constraints, which leads to introduce an infinity 
of variables. Thus, the above algorithm cannot be used in that case. Our result 
thus lefts open the decidability of the satisfiability of non-structural subtyping 
constraints in quasi-lattices where some extrema have a non-empty arity. 

4 Computing Explicit Solutions 

Although testing the satisfiability of subtyping constraints is sufficient for type 
checking, type inference requires to exhibit a solution of a constraint system, not 
just check the existence of a solution. 

In lattices. Pettier [10,11] describes an algorithm for simplifying subtyping 
constraint systems and computing explicit solutions, by identifying type vari- 
ables to their bounds. This algorithm transforms a constraint system C into a 
canonical constraint system noted Can{C), which is closed thus satisfiable. We 
extend here this algorithm to the case of quasi-lattices by adding some specific 
simplification. 

Let us assume, as in section 3, that constraints are formed upon flat terms 
and that the constraint system to be simplified is pre-closed. In order to solve a 
constraint system C in a quasi-lattice of types T(/C), the set of constructors 1C 
is completed in a lattice IC^A by adding T and T elements with an empty arity 
and T k T for all k £ 1C. The constraint system C is first solved in 
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the lattice of types T(/C^’^) by computing Can{C). Then, another set of rules 
(given in Table 2 is applied over Can{C), in order to compute a solution in the 
original quasi-lattice. 

Pottier’s algorithm may introduce variables for representing the greatest 
lower or least upper bounds of a set of original variables in C. For a set A 
of variables, denotes a variable representing the greatest lower bound of A 
and Aa is a variable representing its least upper bound. For each variable a in 
Can{C), there is exactly one type t (resp. t' ^ V) such that a <t £ Can{C) 
(resp. t' < a £ Can{C))[lV\. This type is called the upper (resp. lower) bound 
of a in Can{C) and is noted 1\can{C)<^ (resP- -Hcan(C)«)- 

Let us consider the type constructors int, string, list and nhlist, with 
a{list) = {1} and list<icnhlist, and the constraint system C = {list(int) < 
a, list{string) < a}. A naive algorithm would try to find a lower bound to int 
and string and thus fail, whilst a = nhlist is a solution to C. Some specific rules 
must thus be defined to cover such cases. They are given in Table 2. 



Table 2. Additional rules for computing bounds in a quasi-lattice. 

(Down 1.) D,yA < A,a < t ^ DjJa < A,a < t' if t/l — 7a, 

where F(e) = Ll(4,a(t(e))\{i}t(e)) if it is defined, F(e) = T otherwise 
and for all labels I' € t' (t), t' jl' = tjl' . 

(Down T) D, T < Aa, a < t — >■ D, T < Aa, a < t' if t/l = Aa, 

where t'(e) = U(4,a(t(e))\{i}t(e)) if it is defined, F(e) = T otherwise 
and for all labels I' £ t'{t), t! jl' = t/l' . 

(Up T) D, T < Aa, t < a ^ D, T < Aa, t' < a if t/l = Aa, 

where t! (t) = n(ta(t(e))\{i}t(e)) if it is defined, F(e) = T otherwise, 
and for all labels I' G t'{t), t! /I' = t/l' . 

(Up T) D, 7 a < -L,t < a ->• D, 7 a < -L,F < a \lt/l = ^A, 

where t'(e) = n(ta(t(e))\{i}t(e)) if it is defined, F(e) = T otherwise, 
and for all labels I' £ t'{e), t! /I' = t/l' . 

Here, we show how these rules can be applied to the previous example 
Let us consider the following pre-closed constraint system C = {list{P) < 
a,list{5) < a, a < nhlist, int < /? < int, string < 6 < string}. We have 
Can{C) = {list{X^i^ s}) < a < nhlist, int < /? < int, string < S < string, T < 
A{/ 3 , 5 } < T}. By applying the rule (Up T), we obtain D = {nhlist < a < 
nhlist, int < (3 < int, string < S < string, T < ^ T}, which has a solution 

a = nhlist, j3 = int,^ = string and A{^ , 5 j. = T. 

Proposition 4. The application of the rules of table 2 preserve the solutions 
which co-domain is included in T{K/) U {_L,T}. 

Proposition 5. The rules of table 2 terminate and are confluent. 

We note CanQ(D) the quasi-lattice canonical form of a constraint system D by 
the rules — >■ of Table 2. 
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Proposition 6. Let C be a constraint system and C = CanQ(C'an(C)). For 
each variable a in C , there is exactly one type t ^ V (resp. t' ^ V) such that 
a <t G C' (resp. t' < a G C ). 



Theorem 5. Let C be a pre-closed constraint system, C = Can{C) and C" = 
CanQ(C"). Lf C~ = 0 then the upper bounds (resp. lower bounds lly/jna) 

in C" define a maximal (resp. minimal) solution of C in T{IC). 

It is worth noting that in the case where some type constructors have 
contravariant labels (in C~), there may be no maximal solution. For exam- 
ple, let us take K. = {int, float, with int < float, £+ = {r}, C~ = {a}, 
a{inf) = a{floaf) = 0, a(— >■) = {a,r}. Let C= {a ^a<f3,f3<a^ 
a, int < a, a < float}. C is pre-closed and has two incomparable solutions, 
namely p(/3) = int -G int, p{a) = int and p'{j3) = float — >■ float, p'{a) = float. 

In the case where all type constructors are covariant = 0), our simplifica- 
tion rules thus give max;imal and minimal solutions to pre-closed systems. The 
combination of the above algorithm with the pre-closure algorithm of section 
3.2, gives a set of maximal and minimal solutions for non-pre-closed systems in 
quasi-lattices with a finite number of extrema with empty arities. 

5 Implementation and Applications 

5.1 Performance Evaluation 

The subtyping constraint algorithms described in this paper have been imple- 
mented [3] using the Constraint Handling Rules (CHR) language [8]. The table 
3 shows type checking (with type inference for variables) time for 16 SICStus 
Prolog libraries. The second column corresponds to type checking using the algo- 
rithms presented in this paper for solving subtype inequalities in quasi-lattices. 
The third column corresponds to type checking using Pottier’s algorithms for 
solving subtype inequalities in lattices, which where also implemented in CHR. 
The last column is the ratio between the first and the second time. 



Table 3. Comparison between lattice and quasi-lattice implementations in CHR 



File 


Type checking time 
for quasi-lattice for lattice 


Ratio 


File 


Type checking time 
quasi-lattice lattice 


Ratio 


arrays 


0.78 s 


1.73 s 


2.21 


lists 


1.87 s 


3.5 s 


1.87 


assoc 


2.18 s 


5.02 s 


2.30 


ordsets 


2.38 s 


5.89 s 


2.47 


atts 


1.9 s 


3.21 s 


1.68 


queues 


0.43 s 


1.03 s 


2.39 


bdb 


3.17 s 


5.89 s 


1.85 


random 


0.8 s 


0.92 s 


1.15 


charsio 


0.41 s 


0.96 s 


2.34 


sockets 


1.83 s 


4.33 s 


2.36 


clpr 


46.85 s 


69.63 s 


1.48 


terms 


1.35 s 


3.25 s 


2.40 


heaps 


1.87 s 


4.44 s 


2.37 


trees 


0.81 s 


2.39 s 


2.95 


jasper 


0.98 s 


1.6 s 


1.63 


ugraphs 


14.14 s 


53.19 s 


3.76 
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The algorithm for quasi-lattices performs very well in practice since the ra- 
tio is comprised between 1.15 and 3.76 and an average ratio of 2.04. Since the 
algorithms for quasi-lattices need to compute pre-closure of the constraint sys- 
tem, one could expect a bigger difference due to some combinatory explosion. 
However, the majority of type variables appearing in the constraint system ob- 
tained during type checking already have an upper bound. For these variables, 
it is sufficient to find one lower bound compatible with the existing upper bound 
and then propagate the information in the constraint system. For the other vari- 
ables which are completely unbounded, the strategy consisting in delaying bound 
enumeration enumeration for these variables suffices to avoid the combinatory 
explosion in practice. 



5.2 Applications 

The first application of solving non-structural subtyping constraints in quasi- 
lattices is in our type system TCLP [6,2] for constraint logic programs (CLP). 
In this covariant type system, the type of CLP variables is only constrained by 
upper bounds. Thus, in the case where the type structure forms a lattice, the 
type inference algorithm can always assign the empty type T to variables, which 
means that no type error can be found on variables. On the other hand, in the 
case of a quasi-lattice of type constructors, the type inference algorithm detects 
incompatible types for variables, for example if a variable is constrained to have 
a type smaller than both int and list (a). Moreover, the structure of quasi-lattice 
makes it possible to avoid the use of the metaprogramming type term = T in 
modules where this type is not supposed to be used. 

Another application can be found in the framework of type inference with 
subtyping for languages d la ML. In [10], Pettier uses subtyping constraints for 
type inference in a variant of ML with rows. However, in a lattice, the bottom 
element T denotes the empty type, hence a function typed by T — >■ r cannot 
be applied to any argument. The algorithm for solving subtyping constraints 
described in this paper allows one to use the quasi-lattice obtained by removing 
the T element from the lattice as a type structure. A type error can then be 
produced instead of a typing with the empty type. 



6 Conclusion 

We have studied general forms of non-structural subtyping relations in the quasi- 
lattice of infinite (regular) types formed over a quasi-lattice of type constructors. 
We have shown the decidability of the satisfiability problem for subtyping con- 
straints in quasi-lattices , by generalizing Trifonov and Smith’s algorithm for 
testing the satisfiability of subtyping constraints in lattices to the case of quasi- 
lattices, with a time complexity in 0{m^ M'"n^) where m (resp. M) is the number 
of minimal (resp. maximal) elements of the quasi-lattice and v the number of 
unbounded variables. It is worth noting that the complexity of this algorithm is 
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in 0{n^) for constraint systems where all variables are bounded. In the general 
case we have shown the NP-completeness of this problem. 

We have also extended Pottier’s algorithm for computing solutions to the 
case of quasi-lattices, and have shown that the computed solutions are minimal 
(resp. maximal) solutions when all type constructors are covariant. Finally we 
have mentioned some applications of these algorithms to type inference problems 
in constraint logic programming and in functional programming languages. 

As for future work, one can mention some problems left open in this paper. 
We have already mentioned the case where the extrema of the quasi-lattice of 
constructors have a non empty arity. The decidability of constraint satisfiability 
in finite types is also an open problem. In the homogeneous case (i.e. when the 
type constructors in a subtype relation have the same arity), Frey has shown 
that this problem is Pspace complete in arbitrary posets [7]. 
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Abstract. Recently, Jain, Mahdian and Saberi [5] had given a FPTAS 
for the problem of computing a market equilibrium in the Arrow-Debreu 
setting, when the utilities are linear functions. Their running time de- 
pended on the size of the numbers representing the utilities and endow- 
ments of the buyers. In this paper, we give a strongly polynomial time 
approximation scheme for this problem. Our algorithm builds upon the 
main ideas behind the algorithm in [3]. 



1 Introduction 

General Equilibrium Theory, pioneered by Leon Walras [7], deals with the com- 
plex interaction between agents in a market, each willing to trade the goods he 
possesses for ones he desires. The demand for each good is determined as follows: 
Each good is assigned a price, the buyers sell their endowments at these prices, 
and buy the optimal bundle of goods they can afford. A market equilibrium cor- 
responds to a situation when the demand and supply for each good are exactly 
balanced. The prices are called equilibrium prices, or market clearing prices. The 
goods are assumed to be divisible. The desirability of each bundle of goods is 
expressed by a utility function. In their seminal work, Arrow and Debreu [1] 
proved the existence of equilibrium prices in this model of a market, when the 
utilities are concave functions. However, their proof appeals to fixed point theo- 
rems and is non constructive. The ultimate goal of equilibrium theory as a tool 
for predicting and evaluating economic policies can only be achieved if one can 
actually find an equilibrium. There have been some impressive algorithms for 
this problem, most notably by Scarf [6], but no polynomial time algorithm is 
known. 

Of special interest, from a computational point of view, is the case when 
the utilities are linear functions. Deng, Papadimitriou and Safra [2] stated this 
particular case as open. [5] gave a FPTAS for it. Here, we improve their result to 
give a strongly polynomial time approximation scheme. We now formally define 
the model: 

1.1 Formal Setting 

First, a note about notation. We will use bold face Roman letters to denote 
vectors. If x is a vector, then the component of x will be denoted by Xi. 



P.K. Pandya and J. Radhakrishnan (Eds.): FSTTCS 2003, LNCS 2914, pp. 149—155, 2003. 
© Springer- Verlag Berlin Heidelberg 2003 




150 



N.R. Devanur and V.V. Vazirani 



I • I and II • II denote the h and the ? 2 -norms of a vector, respectively. A market 
consists of: 

1. A set of divisible goods, say A and a set of buyers, say B. W.l.o.g., we 
may assume that A = {1,2,..., n}, B = {1,2, . . . , n'} and that the amount 
available of each good is unity. 

2. The desirability of a bundle of goods, measured by the total order defined 

by a utility function, C/i(x) = each buyer i € B. A bundle 

X G [0, 1]"^ is more desirable to i than x' if and only if [/^(x) > Ui{x.'). 

3. The endowments of the buyers, which they want to trade for the goods. In 

the Fisher setting, the endowment of buyer i is mi units of money. In the 
AD setting the endowment of each agent t is a bundle of goods G [0, l]'^ 
(instead of money, as before), e^’s satisfy: V j G = 1. 

Therefore, an instance of a market consists of the 4-tuple 

(n, n', {Ui, C/ 2 , ■ • • , Un ') , (ei, G 2 , . . . , e„/)} . 

An allocation is a distribution of goods among the buyers. It is represented 
by the vectors Xj G [0, l]'^, V i G B. Trade in the market is facilitated by the 
introduction of prices. Let p = {pi,p 2 , ■ ■ ■ ,Pn) G R" denote the price vector {pj 
is the price of good j). Given these prices, an allocation is said to be a market 
clearing allocation if it satisfies: 

Budget constraint: The buyer cannot spend more than what he has. In the 
Fisher setting, this translates to: x^ • p < ?rti, V f G B. In the AD setting the 
amount of money with a buyer, • p, depends on the prices. 

Optimality: For each buyer, no other bundle of goods that satisfies the budget 
constraint is more desirable than the one allocated. 

Market clearing: There is neither deficiency nor surplus of any goods: V j G 

A price vector for which a market clearing allocation exists is called a market 
clearing price or equilibrium price. Given an instance of a market, the goal is 
to compute a market clearing price and a market clearing allocation, together 
which we call as a market equilibrium. 



Approximate Market Equilibrium. As defined earlier, an allocation is mar- 
ket clearing if it satisfies the 3 conditions: Budget constraint. Optimality and 
Market clearing. We define 2 different notions of approximate market equilibrium 
by relaxing the Optimality and the Market Glearing conditions. 

Definition 1 (Approximate Market clearing). An allocation satisfies the e- 
approximate market clearing condition if neither deficiency nor surplus of goods 
is too high in value: 

k(p) -Pl < e, IpII = 1 

where f{) is the demand in terms of money, i.e., Cj(p) = 
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An allocation (and hence the price) is said to be e- approximate market clear- 
ing if it satisfies the Budget constraint, Optimality and e-approximate Market 
clearing conditions. 

The main result of the paper is: 

Theorem 1. For all e > 0 there is an algorithm that for any instance of the 
Market in the AD setting, gives an e-approximate market equilibrium and needs 
O log max-flow computations. 

In the Fisher setting, there is a demarcation between the buyers and the 
sellers. Each buyer come with a specified amount of money. As a result, the 
equilibrium prices are unique. The algorithm in [3] (DPSV algorithm) starts with 
prices so low that the buyers have an excess of money. It iteratively increases 
the prices so that the surplus money with the buyers keeps decreasing. When 
the surplus vanishes, equilibrium is attained. 

In the AD setting, the equilibrium prices are not unique. So we start with 
arbitrary prices and compute the buyers’ budgets from their initial endowments. 
Let P be the total prices of all goods, also equal to the total budget of all buyers. 
Let / be the maximum sales possible, such that each buyer buys only the goods 
that give him the maximum “bang per buck” while not exceeding his budget. 
The algorithm modifies the prices so that the ratio f /P approaches 1. 



1.2 Related Work 

Deng, Papadimitriou and Safra[2] gave a polynomial time algorithm for the AD 
setting when the number of goods is bounded. They also stated the problem, 
as open, of doing the same for unbounded number of goods. A partial answer 
was given to this by Devanur, et al [3]. They gave a polynomial time algo- 
rithm for the Fisher setting, with linear utilities and no constraint on number 
of goods. However, no polynomial time algorithm is known for the AD setting, 
even when the utilities are linear. Jain, Mahdian and Saberi [5] gave a FPTAS 
for this case. In particular they get an e-approximate approximation that re- 
quires O ^^(logn -I- nlog [/ -I- logM)^ max-flow computations, where M and 
U depend on the endowments and utility functions Their algorithm depends on 
the size of the numbers giving the utility rates and endowments of the buyers. In 
this paper we give a strongly polynomial time approximation scheme that runs 
in time O log , where n is the number of buyers. Note that the running 
time of our algorithm does not depend on the size of the utility rates and en- 
dowments of the buyers. This is analogous to the standard notion of strongly 
polynomial time algorithms where the running time is independent of the size of 
the numbers occurring in the instance. The improvement comes about because 
[5] use the algorithm in [3] as a black box, whereas we open it up and build upon 
the main ideas in [3]. 
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1.3 Organization 

The paper is organized as follows: In Section 2, the basic definitions and results 
of [3] are summarized. The algorithm is stated in Section 3 and analyzed in 
Section 4. Conclusion and Open Problems are in Section 5. 

2 Preliminaries 

This section summarizes the basic definitions and algorithm of [3] that we need 
here. Some of the results that we state here appear only in the full version [4]. 
Let p = (pi, . . . ,pn) be any price vector, ai = ma,Xj^A{uij /pj} is the maximum 
bang per buck that buyer i can get from any good. He is equally happy with any 
combination of goods attaining this maximum. Define the bipartite graph G with 
bipartition (A,B) and for i € B,j & A, (i,j) is an edge in G iff = Uij/pj. Call 
this the equality subgraph and its edges the equality edges. Consider the following 
network: Direct edges of G from A to B and assign a capacity of infinity to all 
these edges. Introduce source vertex s and a directed edge from s to each vertex 
j € A with a capacity of pj. Introduce sink vertex t and a directed edge from 
each vertex i G B to t with a capacity of e^. This network will be denoted N{p). 

W.r.t. prices p, for T C B, define its money M{T) := ^i^rprm. Similarly, 
for set S C A, define its money P{S) := '^j^gPj and f{S) = the maximum 
flow through S'; the context will clarify the price vector p. For S C define its 
neighborhood in iV(p) 

r(S) = {iGB\3j gS with(i, j) G G}. 

By the assumption that each good has a potential buyer, B{A) = B. Let M = 
M{B),P = P{A) and / = /(H). 

For a given flow / in the network iV(p), define the surplus of buyer i, 7i(p, /), 
to be the residual capacity of the edge (i,t) with respect to /, which is equal 
to rui minus the flow sent through the edge (i,t)- Define the surplus vector 
7(P,/) := (71 (p,/),72(p, /),•••, 7n(P,/))- 

Definition 2. Balanced fiow for any given p, A max flow that minimizes 
ll7(P:/)ll over all choices of f is called a balanced fiow. 

If ll7(P:/)ll < ll7(P:/0ll; we say f is more balanced than f. 

For a given p and a fiow / in N{p), let i?(p, /) be the residual network of 
A^(p) with respect to the fiow /. The following theorem characterizes all balanced 
flows: 

Lemma 1. ([4]) A max flow f is balanced if and only if the residual network 
w.r.t the flow, R{p, /) is such that if there is a path from a buyer j to another 
buyer i in R{p,f) \ {s,t}, then ji{p,f) < 7i(Pi/)- 

Lemma 2. ([4]) For any given p, if /, /' are balanced flows, then 'yip, f) = 
lipj')- 
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As a result, one can define the surplus vector for a given price as 7(p) := 
7(Pi /) where / is the balanced flow in A^(p). 

Lemma 3. ([4]) A balanced flow in N{-p) can he found using 0{n) max-flow 
computations. 

3 The Algorithm 

Here we describe what we call as one phase of the algorithm. At the beginning of 
each phase, assume that a price vector p and a vector of incomes, m are given. 
So construct the network N{p) and find a balanced flow in it. The graph G is 
then partitioned into an active and a frozen subgraph. The algorithm proceeds 
by raising the prices of goods in the active subgraph. Let H C B he the set 
of buyers whose surplus is equal to the maximum surplus in B, say 5. H' <Z A 
is the set of goods adjacent to at least one buyer in H. The active graph is 
initialized to (iJ, H'). Let (F, F') denote the frozen part, that is F := B\H and 
F' := A\ F[' . Prices of goods in H' are raised in such a way that the equality 
edges in it are retained. This is ensured by multiplying prices of all these goods 
by X and gradually increasing x, starting with x = 1. Note that there are no 
edges from FI to F'. This ensures that the edges from F[ to H' remain in the 
equality graph. Also, all edges from F to FI' are deleted, since they go out of 
the equality graph as soon as the prices in F[' are raised. 

Each phase is divided into iterations in which the prices of goods in FI' are 
increased until one of the two following events happen: 

1. A new edge (z, j) appears: This happens because for buyers in H, the goods 
in F' are getting relatively less expensive and hence more desirable. First 
compute a balanced flow / for the new equality subgraph. If some buyer in 
H has a surplus less than 6/2 then that is the end of the phase. If all the 
buyers in H have surplus at least 6/2 then add to H all the vertices that 
can reach any vertex in H in the residual network corresponding to / in G. 
Continue the next iteration 

2. A set goes tight: That is, for some S C H', P{S) = M{F{S)). The surplus of 
some of the buyers in H is dropped to zero and that terminates the phase. 

Note that finding which edge appears next is easy. Also, [3] prove that finding 
the first set to go tight can be done using 0(n) max-flow computations. 

Lemma 4. ([4]) The number of iterations executed in a phase is at most n. 
Hence each phase requires 0{n^) max-flow computations. 

In each phase, the I 2 norm of the surplus vector is reduced by a polynomial 
fraction. 

Lemma 5. ([4]) If Po and p* are price vectors before and after a phase, 
Il7(p*)f <ll7(Po)f(l-l^)- 
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An epoch of the algorithm involves running several phases, with a fixed m. 
Typically, an epoch ends when |7(p)| drops to a specified fraction of its initial 
value. 

Lemma 6. If Pq and p* are price vectors before and after an epoch, then the 
number of phases in the epoch is O ^n^log ( | 7 (p°)| ) ■ 

The main difference between the Fisher setting and the AD setting is that the 
incomes of the buyers are fixed in the Fisher setting, whereas they are dependent 
on the prices in the AD setting. In order to avoid having to change the incomes 
continuously, the algorithm only updates them at the end of each epoch. 

The main algorithm is as follows: Start with the price vector p = 1” and 
compute the incomes. Run an epoch until |7(p)| < ne. If at this point either 
P — M < ne or P > then end the algorithm. Otherwise update the incomes 
and run the next epoch. 



4 Analysis of the Algorithm 

Lemma 7. A price p is 2e-approximate market clearing if w.r.t. p, 

^ < e 

P ^ t- 

Proof. It follows from the observation that |^(p) — p \<2{P-f). 

Proof (of Theorem 1). Correctness: Note that P > n and P > f. |7(p)| = 
M — f. Since the algorithm always increases the prices of goods in H', any 
increase in P always results in an equal increase in /. Each subsequent run 
starts with the prices and flow obtained in the previous run. Hence P — f never 
increases. P — f < n. If when the algorithm ends, P — M < ne, then 

P - f = {P - M) + {M - f) <2ne^ < 2e. 

On the other hand, if P > then again 



P — f n 
P - (n/e) 

Running time: Since at the beginning of each epoch |7(p)| < P - f < n 
and the epoch ends if |7(p)| < ne, there are 0(n^ log phases in each epoch. If 
in each epoch P — M > ne, then after ^ epochs P > y. Moreover, from Lemma 
4, each phase needs 0{n^) max-flow computations. Hence the algorithm needs 

O log max- flow computations. 

The running time can be brought down by a more complicated rule to end 
epochs (and update incomes): During the running of the algorithm, we maintain 
a variable a such that P — f < na. Initially, a = 1. Run an epoch until |7(p)| = 
M — f < na/4. If at this stage P > ^ then end the algorithm. If P — M < ^ 
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then a -i— a/2. If a < e then end the algorithm. Otherwise, update the incomes 
and continue with the next epoch. 

It is clear that the algorithm is correct. At the end of each epoch either 
a ^ a/2 or P increases by at least The former can happen 0(log times, 
and the latter O(^) times. The theorem follows. 

5 Conclusion and Open Problems 

In this paper, we give a strongly polynomial time approximation scheme for 
the problem of computing market equilibrium with linear utilities. We leave 
open the problem of finding an exact equilibrium in polynomial time. The AD 
setting appears to be computationally harder than the Fisher setting (for which a 
polynomial time exact algorithm is known [3]) . For one, the incomes of the buyers 
are changing with the prices. Hence any algorithm that iteratively improves the 
prices (like the DPSV algorithm) is chasing a moving target. Moreover, it does 
not support unique equilibrium prices. Consider two agents, each coming to the 
market with a unit amount of distinct goods. Suppose that the utility of each 
agent for her good far outweighs the utility for the other good. Then, for a whole 
continuum of prices we have market equilibria in which each agent buys only 
what she has. This example may also be pointing out the difficulty of obtaining a 
polynomial time algorithm for this model, even when restricted to linear utilities. 
The difficulty is: which equilibrium price should the algorithm shoot for? Note 
that even when a discrete algorithm is faced with multiple, though discrete, 
solutions, uniqueness is arbitrarily imposed - by breaking ties arbitrarily, and 
asking for the lexicographically first solution under the imposed ordering. 

Acknowledgments. We would like to thank Nisheeth Vishnoi for providing 
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Abstract. For a restricted class of monoids, we prove that the decidabil- 
ity of the existential theory of word equations is preserved under graph 
products. Furthermore, we show that the positive theory of a graph prod- 
uct of groups can be reduced to the positive theories of some of the factor 
monoids and the existential theories of the remaining factors. Both re- 
sults also include suitable constraints for the variables. Larger classes of 
constraints lead in many cases to undecidability results. 



1 Introduction 

Since the seminal work of Makanin [14] on equations in free monoids, the decid- 
ability of various theories of equations in different monoids and groups has been 
studied, and several new decidability and complexity results have been shown. 
Let us mention here the results of [17,20] for free monoids, [3,15] for free groups, 
[7] for free partially commutative monoids (trace monoids), [8] for free partially 
commutative groups (graph groups), [4] for plain groups (free products of finite 
and free groups), and [19] for torsion-free hyperbolic groups. 

In this paper, we will continue this stream of research by considering graph 
products (Section 2.2). The graph product construction is a well-known con- 
struction in mathematics, see e.g. [12,13], that generalizes both free products 
and direct products: An independence relation on the factors of the graph prod- 
uct specifies, which monoids are allowed to commute elementwise. Section 3 
deals with existential theories of graph products. Using a general closure result 
for existential theories (Thm. 2), we will show in Section 3.2 that under some 
algebraic restriction on the factors of a graph product, the decidability of the 
existential theory of word equations is preserved under graph products (Thm. 4) . 
This closure result remains also valid if we allow constraints for variables, which 
means that the value of a variable may be restricted to some specified set. More 
precisely, we will define an operation, which, starting from a class of constraints 
for each factor monoid of the graph product, constructs a class of constraints for 
the graph product. We will also present an upper bound for the space complexity 
of the existential theory of the graph product in terms of the space complexities 
for the existential theories of the factor monoids. Using known results from [21] it 
follows that the existential theory of word equations of a graph product of finite 
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monoids, free monoids, and torsion-free hyperbolic groups is decidable. This re- 
sult generalizes the decidability result for graph products of finite monoids, free 
monoids, and free groups groups from [5]. 

In Section 4 we consider positive theories of equations. It turns out that the 
positive theory of word equations of a graph product of groups with recogniz- 
able constraints can be reduced to (i) the positive theories with recognizable 
constraints of those factors of the graph product that are allowed to commute 
elementwise with all the other factors and (ii) the existential theories of the 
remaining factors. 

Proofs that are omitted in this paper can be found in the full version [6] . 



2 Preliminaries 

Let A be a possibly infinite alphabet. The empty word over A is £. A partial 
involution t on A is a partial function on A with 6(i(a)) = a for all a € dom(t). 
Let A4 = (M, o, 1) be a monoid. A subset L C M is recognizable if there exists 
a homomorphism h : Ai ^ Q to a finite monoid Q such that L = h~^{F) for 
some F C Q. With REC(AI) we denote the class of all recognizable subsets of 
AI, it is a Boolean algebra. The set RAT(AI) of rational subsets of M is defined 
via rational expressions over Ai, see e.g. [21. If Ai is finitely generated, then 
REC(TW) C RAT(TW). 

2.1 Mazur kiewicz Traces 

For a detailed introduction to trace theory see [9]. An independence alphabet is 
a pair (A,I), where A is a possibly infinite set and / C A x A is a symmetric 
and irreflexive independence relation. Its complement D = (A x A) \ / is the 
dependence relation. The pair (A, D) is called a dependence alphabet. For a € A 
let /(a) = {b € A \ (a,b) € 1} and D{a) = A \ I{a). Let =/ be the smallest 
congruence on A* that contains all pairs (a&, ba) with (a, b) G /. The trace monoid 
(free partially commutative monoid) M(A,I) is the quotient monoid A*/=/, its 
elements are called traces. Since A is not necessarily finite, we do not restrict to 
finitely generated trace monoids. Extreme cases are free monoids (li D = Ay. A) 
and free commutative monoids (if D = {(a, a) | a G A}). The trace represented 
by the word s G A* is denoted by [s]/. For R C M(A, I) xM(A, I) we denote with 
M(A, J)/i? the quotient monoid of M(A,/) wrt. to the smallest congruence on 
M(A, I) containing R. If A is finite, then it is easy to see that L G REC(M(A, /)) 
if and only if {s G A* | [s]/ G L} is regular. 

We define on A an equivalence relation ~ by a ~ & if and only if D(a) = D{b) 
(or equivalently /(a) = I{h)). Since D is reflexive, ~ C _D. An equivalence class 
of ~ is called a complete clan of (A,I). In the sequel we will briefly speak of 
clans. A clan C is thin [8] if D{a) yf 0 for some (and hence all) a G C. The 
cardinality of the set of thin clans is denoted by c(A, I) - of course it may be 
infinite. Note that c(A, I) yf 1, and c(A, /) = 0 <G> / = 0. 
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A partial function / on A is compatible with / if (a, b) G I and a,b G dom(/) 
imply (f{a),f{b)) G I. This allows to lift / to a partial function on M(A, /) 
by setting /([oi • • • a„]/) = [/(a„) • • • /(oi)]/. The domain of this lifting is 
M(dom(/),/). Note that we reverse the order of the symbols in the /-image 
of a trace. In our applications, / will be always a partial injection on A like 
for instance a partial involution l on A. In the latter case, the lifting of r to 
M(dom(t),/) is again a partial involution on M(A, J). 

2.2 Graph Products 

Graph products [12] generalize both free products and direct products. Let 
be an independence alphabet with S finite, and let AIo- = Ig.) 

be a monoid for every a G S. Define an independence alphabet (A, I) by 
A = I = Mcr X Mr where w.l.o.g. M„ fl Mr = 0 for 

cr yf T. Define R C M(A, I) x M(A, I) by 

i? = U I a, &, c G Mr,ao„ b = c} U {(1<^,£)}. 

The graph product P = {A4cr)aes) is the quotient monoid M(A, /)//?. 

In case Is = ^ (resp. /i; = (A x A) \ Idi;) we obtain the free (resp. direct) 
product of the 

For s,t G M(A, I) we write s -Gr t if there exist u,v G M(A, I) and {£, r) G R 
with s = u£v and t = urv. Let IRR = {s G M(A, I) \ -i3t : s -Gr f}. The relation 
-^R is clearly Noetherian and also confluent (see [6, Lemma 2.2]). It follows that 
there is a natural bijection u; : P — >■ IRR such that cc G P is represented by 
the trace oj{x). Moreover, for x,y,z G P we have a;?/ = z in P if and only if 
u){x)uj{y) ^R Lu(z). Note that M(A, I) is in general not finitely generated. 

2.3 Relational Structures and Logic 

Let us fix a relational structure A = (A, (i?i)igj), where Ri C A"* for i G J. 
Given further relations Rj, j G AT, J fl Lf = 0, we also write (A,{Ri)i^K) for 
the structure (A, jujy). First-order formulas over A are built from the 

atomic formulas Ri{xi,... and x = y (where i G J and xi,... ,Xm,x,y 

are variables ranging over A) using Boolean connectives and quantifications over 
variables. The notion of a free variable is defined as usual. A formula without free 
variables is a sentence. If ^p{x\ , . . . , Xn) is a first-order formula with free variables 
among xi, . . . ,Xn and ai, . . . , a„ G A, then A ]= (f(ai, . . . , a„) means that (p 
evaluates to true in A if Xi evaluates to a^. The first-order theory of A, denoted 
by FOTh(A), is the set of all first-order sentences (p with A ^ The existential 
first-order theory 3FOTh(A) of A is the set of all sentences in FOTh(A) of the 
form 3x\ ■ ■ ■ : p{x\, . . . , cc„), where ip{x\, . . . , Xn) is a Boolean combination 

of atomic formulas. The positive theory posTh(A) is the set of all sentences in 
FOTh(A) that do not use negations, i.e., that are built from atomic formulas 
using conjunctions, disjunctions, and existential and universal quantifications. 
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We view a monoid A4 = (M, o,l) as a relational structure by considering 
the multiplication o as a ternary relation and the constant 1 as a unary relation. 
Instead of o(x,y, z) we write x o y = z or briefly xy = z. We also consider 
extensions (AI, (Ri)i^j) of the structure Ai, where Ri is a relation of arbitrary 
arity over M . In many cases, a partial involution t on M will belong to the 
Ri. It is viewed as a binary relation on M. In case C C 2^ , we also write 
{M,C,{Ri)i(zj) instead of (Al, (L)ieC, call formulas of the form 

X € L for L € C constraints. Constants from M can be included as singleton 
subsets into C. Note that if A1 is finitely generated by A, then constants from 
r suffice in order to define all monoid elements of Af. On the other hand, the 
further investigations are not restricted to finitely generated monoids. 

It is known that already the V3^-fragment of FOTh({a, b}* , a, b) is undecid- 
able [10]. Together with Presburger’s result on the decidability of FOTh(N) it 
follows that the decidability of the full first-order theory is not preserved under 
free products. For a restricted class of monoids and existential sentences, we will 
show such a closure result in Section 3.2 even for general graph products. 



3 Existential Theories of Graph Products 

Based on results from [8] for finitely generated trace monoids with a partial 
involution, we prove in Section 3.1 a general preservation theorem for existential 
theories. In Section 3.2 we use this result in order to show that under some 
restrictions graph products preserve the decidability of the existential theory. 

All our decidability results in Section 3 are based on the main result from 
[8], see also [6, Thm. 3.1]: 

Theorem 1. For every A: > 0, the following problem is in P SPACE: 

INPUT: A finite independence alphabet {A, I) with c{A,I) < k, a partial 
involution l on A that is compatible with I, and an existential sentence 4> over 
(M(A, 7), REC(M(A, /)), r) {with l lifted to M(dom(t), /)). 

QUESTION: Does (M(A, 7), REC(M(A, /)), t) ^ <!> hold? 

In Thm. 1, a recognizable set L G REC(M(A, 7)) has to be represented by a 
finite automaton for the regular language {u G A* \ [m]/ G L}, which is crucial 
for the PSPACE upper-bound, see e.g. the remarks in [6]. Since every singleton 
subset belongs to REC(M(A, 7)), constants are implicitly allowed in Thm. 1. 

Thm. 1 cannot be extended to the case of rational constraints: By [16, 
Prop. 2.9.2 and 2.9.3], 3FOTh(M(A, 7), RAT(M(A, 7))) is decidable if and only 
if 7 U Id/i is an equivalence relation. 



3.1 A General Preservation Theorem 

For the further discussion let us fix an independence alphabet {A, I), a partial 
involution l on A, & subset C C 2"^ of constraints, and additional predicates Rj 
(1 < j < m) of arbitrary arity over A. Let M = M(A, 7) and A = {A, t, (A)igc)- 
Throughout this section we assume that: 
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(1) i is compatible with /, 

(2) there are only finitely many clans of {A, I), i.e., there are only finitely many 
sets D{a), 

(3) dom(t) as well as every clan of {A, I) belong to C, and 

(4) 3FOTh(A, {Rj)i<j<m) is decidable. 

Due to (1), we can lift i to a partial involution with domain M(dom(t), I). Since 
/ is a union of Cartesian products of clans, (2) and (3) imply that / is definable 
by a Boolean formula over {A, (L) lgc)- 

From the unary predicates in C we construct a set £(C,/) C 2®^ as follows: 
A C-automaton A is a finite automaton in the usual sense, except that every 
edge of A is labeled with some language L £ C. The language L{A) C A* is 

the union of all concatenations L 1 L 2 ■ ■ ■ L„ for which there exists a path qo 
qi • • • qn-i q-a in A from the initial state go to a final state We say 
that A is /-closed if [u]/ = [u]/ and u G L{A) imply v G L(A). In the following, 
we will identify L(A) with {[«]/ | u G L{A)} C M. Then C{C,I) consists of all 
languages L{A) C M such that A is an /-closed C-automaton. For effectiveness 
statements, it is necessary that languages in C have some finite representation. 
Then, also languages from £(C,/) have a canonical finite representation. 

Since A C M, we can view every relation Rj also as a relation on M. This is 
done in the following theorem (whereas 6 denotes the lifting to M(dom(t), /)), 
which is the main result of this section: 

Theorem 2. If 3FOTh(A, t, {L)l£C, {Rj)i<j<m) belongs to NSPACE{s{n)) , 
then 3FOTh(M, l, C{C,I): {Rj)i<j<m) belongs to NSPACE{2^^^^ + s{n^^^'^)). 



Reducing the number of generators. The main difficulty in the proof of 
Thm. 2 is to reduce the infinite set A of generators of M to a finite set of 
generators B. For this, we will prove a technical lemma (Lemma 2) in this 
paragraph. In the sequel, we will restrict to some reduct {A, i,,{L)l£Ti) of the 
structure A from the previous section, where T> (~C \s finite and contains dom(t) 
as well as every clan of {A,I)- We will denote this reduct by A as well. Assume 
that T> = {Lo,Li, . . . ,Lfe}, where dom(i) = Lq and Li, . . . , (/ < fc) is an 

enumeration of the clans of {A, /). Thus, {Ai, . . . , Li} is a partition of A. 

Given another structure B = {B, (f, (/Ci)o<i<fc) (with Q a partial involution on 
B, Ki C B, and Kq = dom(C)), a mapping / : A — >■ i? is a strong homomorphism 
from A to B if for all a G A and 0 < i < k: 

a G Li ^ f{a) G Ki and Va G dom(t) : /(r(a)) = C(/(o)) 



Lemma 1. Given a finite V C C, we can effectively construct a finite structure 
B = {B,C., (/fi)o<i<fc) {with C a partial involution on B, Ki C B, and dom(C) = 
/Co) such that \B\ < 2''+^(2^+^ -I- 2) and there exist strong homomorphisms 
/ : A — >■ B and g : B — >■ A with f surjective. 
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Proof (sketch). The picture on the right vi- 
sualizes the construction for fc = 2. The 
set Li (resp. L 2 ) is represented by the 
left (resp. lower) half of the whole square, ^ ^ £2 
which represents A. The inner circle represents 
dom(t) = Lq, and the thin lines represent the 
partial involution r. The 22 regions that are 
bounded by thick lines represent the (in gen- 
eral infinite) preimages f~^{b) (5 G B). Basi- -^2 
cally, B results by contracting every nonempty 
region to a single point. Nonemptyness of a re- 
gion can be expressed as an existential sentence 
over A and is therefore decidable. □ 

Since the strong homomorphism / is surjective in the previous lemma and 
{Li, . . . , Li} is a partition of A, also {Ki, . . . , Ki) is a partition of B. 

Now assume that we have given a third structure C = (C, (^i)o<i<fc)> where 

C is finite, ^ is a partial involution on C, yl^ C C for 0 < z < k, dom(^) = Aq, 
and {yli, . . . ,A(} is a partition of C (with = 0 allowed). In the sequel, an 
embedding ofC in A is an injective strong homomorphism /i : C — >■ A. By taking 
the disjoint union of C and the structure B from Lemma 1, it is not hard to 
prove the following lemma, where /(/) = {{f{a),f{b)) \ (a, 6) G /} and similarly 
for g{J). 

Lemma 2 . Given T> and C as above, we can effectively construct a finite struc- 
ture B = (i?, C) (-f^i)o<i<fc) {with C a partial involution on B, Ki C B, and 
dom((() = Ki)) together with an independence relation J C B x B such that: 

— C C B , \B\ < 2'=+i(2^+i -I- 2) -I- \C\, ( is compatible with J , and 

— for every embedding h : C ^ A there are strong homomorphisms / : A — >■ B 
and 5 : B — >■ A with f{I) C J , g{J) C I, and f{h{c)) = c, g{c) = h{c) for all 
ceC. 

Proof of Thm. 2. Fix a Boolean formula 0 over (M, z, £(C, I), (Bj)i<j<m)- We 
have to decide whether 0 is satisfiable in (M, £(C, I), For this, 

we will present a nondeterministic algorithm that constructs a finitely generated 
trace monoid M' with a partial involution ( and a Boolean formula <f>' over 
(M', ^,REC(M')) such that 0 is satisfiable in {M,L,£{C,I),{Rj)i<j<m) if and 
only if for at least one outcome of our nondeterministic algorithm, (j)' is satisfiable 
in (M', C, REC(M')). This allows to apply Thm. 1. 

Assume that every C-automaton in 0 only uses sets among the finite set 
Dec. Assume that also dom(t) as well as every clan of {A, I) belong to T>. Let 
T> = {Lq, . . . , Lk}, where Lq = dom(z) and L\, . . . , is an enumeration of the 
clans of {A, I). Let £ = £{£>,!). 

First we may push negations to the level of atomic subformulas in 0. More- 
over, disjunctions may be eliminated by nondeterministically guessing one of 
the two corresponding disjuncts. Thus, we may assume that 0 is a conjunction 
of atomic predicates and negated atomic predicates. We replace every negated 
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equation xy ^ z (resp. i.{x) ^ z) hy xy = z' A z ^ z' , (resp. l{x) = z' A z ^ z') , 
where z' is a new variable. Thus, we may assume that all negated predicates in 
9 are of the form x ^ y, x ^ L {L € C), and ~<Rj{xi , . . . , x„). 

We can write 0 as a conjunction (f> A rp, where ip contains all formulas of 
the form Let x ^ y he & negated equation in (p, where x 

and y are variables. Since x y is interpreted in the trace monoid M, we can 
replace x y by either {x = zau A y = zbv A a,b G L A a ^ b) or 
{x = zu A y = zv A u G L - Wl A v ^ L ■ M), where L G T> is a clan of {A, I) 
that is guessed nondeterministically. In the first case, we add a,b G L A a ^ b 
to the “A-local” part ip. In the second case, we have to construct an /-closed 
P-automaton for L ■ M, which is easy, since all clans belong to T>. Thus, in the 
sequel we may assume that (p does not contain negated equations. 

So far, we have obtained a conjunction (p A ip, where (p is interpreted in 
(M, i, C) and ip is interpreted in the base structure (A, (Rj)i<j<m)- The formula 
(p does not contain negated equations. Let S' be the set of all variables that 
occur in (p A Ip, and let f/ C S contain all variables that occur in the A-local 
part Ip. Thus, all variables from 17 are implicitly restricted to A C M. Note that 
variables from 17 may of course also occur in (p. In case <p contains a constraint 
X G L with L G C and x G 17, then we can guess L' G V with L D L' (P and 
replace x G L by the constraint x G L' , which will be shifted to ip. Hence, we 
may assume that for every constraint x G L that occurs in <p, we have x G S\f2. 

Next, for every variable x G f2 we guess whether x G Lq = dom(6) or x ^ 
dom(t) and add the corresponding (negated) constraint to ip. In case x G dom(6) 
was guessed, we add a new variable x to 17 and add the equation t(x) = x to ip. 
Next, we guess for all different variables x,y G (here 17 refers to the new set 
of variables including the added copys x), whether x = y or x ^ y. In case x = y 
is guessed, we can eliminate for instance y. Thus, we may assume that for all 
different variables x,y G f2 the negated equation x ^ y belongs to ip. Finally, for 
every set Li with 1 < z < fc and every x G 17 we guess whether x G L or x ^ L 
and add the corresponding constraint to ip. We denote the resulting formula by 
Ip as well. 

Most of the guessed formulas ip won’t be satisfiable in (A, {Rj)i<j<m) (e.g., 
if Li (1 Lj = 0 and the constraints x G Li and x G Lj were guessed). But 
since 3FOTh(A, {Rj)i<j<m) is decidable, we can effectively check whether the 
guessed formula ip is satisfiable. If it is not satisfiable, then we reject on the 
corresponding computation path. Let us fix a specific guess, which results in a 
satisfiable formula ip, for the further consideration. 

Now we define a finite structure C = (17, (Ai)o<i<fc) as follows: Let 17 = 
{x I X G 17} be a disjoint copy of the set of variables 17. For 0 < z < A: let Aj be 
the set of all x ^17 such that x G Li belongs to ip. Finally, we define the partial 
involution ^ on 17 as follows: The domain of ^ is Aq and ^(x) =y in case z(x) = y 
or i{y) = X belongs to the conjunction ip. Since ip is satisfiable and {Li, . . . , Li} 
is a partition of A, it follows that {Ai . . . , Ai} is a partition of 17 (with Ai = % 
allowed). Thus, C satisfies all the requirements from Lemma 2, which can be 
applied to the structures A and C. Hence, from C we can effectively determine 
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a finite structure B = (_B, C, (Ki)o<i<k) together with an independence relation 
J C B X B such that H C B, the partial involution C is compatible with J, and 
for every embedding : C — >■ A there exist strong homomorphisms / : A — >■ B 
and g : B — >• A with /(/) C J, g{J) C I, and f{h{x)) = x, g(x) = h(x) for every 
X G 17. We also obtain a size bound of |17| + 2^^^^ C for \B\. We denote 

the lifting of C to M(i?, J) by C as well. Let M' = M.{B, J). 

Recall that we have to check whether there exist assignments k : 17 — >■ A 
and A : S’ \ 17 — >• M such that n satisfies ip in (A, {Rj)i<j<m) and k U A satisfies 
(f) in (M, t, £). We have already verified that the conjunction ip is satisfiable in 
(A, {Rj)i<j<m)- Let us fix an arbitrary assignment ac : 17 — >■ A that satisfies ip 
in (A, (Rj)i<j<m)', we do not have to determine k explicitly, only its existence is 
important. Then k defines an embedding /i : C — >■ A by h(x) = k{x) for x G 17. 
Therefore there exist strong homomorphisms / : A — >■ B and g : B — >■ A with 
/(ac(x)) = X, g{x) = k(x) (x G 17), and /(/) C J, g{J) C I. Hence, we can lift / 
and g to monoid homomorphisms / : M — >■ M' and g : M' — >■ M with 

/(''(«)) = C(/(s)) for s G dom(6) and g{C(t)) = for t G dom(C). (1) 

Given a 27-automaton A, we define a new automaton A! by replacing every edge 
p g in A by p q (and changing nothing else). Recall that Ki C B. 
Since A is /-closed. A' is easily seen to be J-closed. Moreover, since B is finite, 
-b(Al') C M' is a recognizable trace language. Recall that for every 0 < i < k, we 
have a G Li if and only if f{a) G Ki and b G Ki if and only if g{b) G Li. Thus, 
the following statement is obvious: 

Lemma 3. Let s G M and t G M'. Then s G L(A) if and only if f{s) G L(A') 
and t G L(A') if and only if g{f) G L(A). 

Next, we transform the conjunction <p into a conjunction (p' , which will be inter- 
preted over (M', C, REC(M')), by replacing in (p every occurrence of a variable 
X G 17 by the constant x G L2 B . Thus, <p' contains constants from 17 and vari- 

ables from Ei\f2, which range over the trace monoid M'. Moreover, the partial 
involution i is replaced by C and every constraint x G L{A) (resp. x ^ L{A)) in 
(p is replaced by x G A(A') (resp. x ^ L{A')) (note that x G S’ \ 17). Thus, all 
constraint languages in <p' belong to REC(M'). 

Lemma 4. The following two statements are equivalent: 

(a) There exists an assignment A : S\17 -G M such that ac U A satisfies the 
Boolean formula (p in (M, 6, £). 

(b) There exists an assignment A' : S\17 —A M' that satisfies the Boolean formula 
(p' in (M',C,REC(M')). 

Proof. Let the function tt map every variable x G 17 to the constant x G B. 
Thus, f{n{x)) = 7t(x) and g{Tr{x)) = n{x) for every x G 17. First, assume that 
(a) holds. We claim that (b) holds with A' = / o A. Let u' = v' be an equation 
of <p' , which results from the equation u = v of (p. The difference between u = v 
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and u' = v' is that the partial involution l is replaced by C and every occurrence 
of a variable x & in m = u is replaced by the constant t:{x) = x in u' = v' . The 
assignment kUA is a solution of u = f in (M, r). Since the monoid homomorphism 
/ : M — >• M' satisfies (1) and / o(kUA) = /ok U /oA = 7t U A', the mapping 
A' is a solution of u' = v' in (M', (^). The satisfaction of (negated) constraints by 
A' follows from Lemma 3. 

Now assume that (b) holds. We claim that (a) holds with A = goX' . Consider 
an equation u = v of 4> and let u' = v' be the corresponding equation of 4>' . Thus, 
A' is a solution of u' = v' in (M', () . By construction of u' = v' , A' U tt is a solution 
of u = u in (M', C). Since the monoid homomorphism ^ : M' — >■ M satisfies (1), 
the mapping o (A' U tt) = A U k is a solution of u = w in (M, r). For the 
satisfaction of (negated) constraints by A we use again Lemma 3. □ 

For the previous lemma it is crucial that the conjunction (p does not contain 
negated equations, because the homomorphisms / and g are not injective in 
general, and therefore do not preserve inequalities. 

Since Lemma 4 holds for every k : 17 — >■ A that satisfies ip in the structure 
(A, (i?j)i<j<m), and we already know that such an assignment exists, it only 
remains to check whether p' is satisfiable in (M', REC(M')). By Thm. 1 this 
can be done effectively. This proves the decidability statement in Thm. 2. A 
closer investigation of the outlined decision procedure gives the space bounds in 
Thm. 2 (note that c{B,J) = c{A,I) is a constant). 

3.2 Closure under Graph Products 

Fix a graph product P = F{S,Is, {M(r)aeE), where Mtr = (M^, o^r, 1^). Define 
A, I, R, IRR, and w : P — >■ IRR as in Section 2.2. Recall that uj is bijective. 
Let invo- = {{a,b) G Mg- x \ a b = la-} and inv = Uo-ex' 

Ua = dom(invcr), Va = ran(invcr), U = (Jo-ex ^a, and V = (J^gj; Va- 

We also include constraints into our considerations. Hence, for every a G 
S let Ca C 2^” be a class of constraints. We assume that Ua,Va G €„■ Let 
C = U^eijCcr. Recall the definition of the class C = C{C,I) C from 

Section 3.1. We define the class IC = JL{C, I, R) C by ZC = {A fl IRR | 

L G C}. Using the one-to-one correspondence between P and IRR, we may view 
L n IRR also as a subset of P, hence JC Q 2^ . 

Example. If REC(Afcr) C C<r, then also REC(P) C JC [6, Lemma 4.8]. A subset 
A C A4 of a monoid Ai, which is finitely generated by A, is called normalized 
rational if the set of length- lexicographical normalforms from A* (wrt. an arbi- 
trary linear order on A) that represent elements from A is rational [6]. It is not 
hard to see that AC is the set of normalized rational subsets of P in case Ca is 
the set of normalized rational subsets of AIo-- 

Throughout this section we make: 

Assumption 3 For all a € E and a,b,c G Ma, if a Oa b = a c = la or 

b Oa a = coa a = la, then b = c. Thus inv^- is a partial injection. 
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For example groups, free monoids, the bicyclic monoid {a, 6}*/ah=e, and finite 
monoids satisfy Assumption 3,^ whereas {a,b, c}* /ab=ac=e does not. 

By Assumption 3, inv is a partial injection on A with dom(inv) = U and 
ran(inv) = V. Since inv is compatible with I, we can lift it to M(A, I) (see Sec- 
tion 2.1). The resulting partial injection has domain M([/, I) and range M(V, I). 
The following theorem is the main result of this section. 

Theorem 4. If Assumption 3 holds and for all a € S, 3¥OA\i{Ma,Ca) belongs 
to NSPACE{s{n)), then 3FOTh(P,2C) belongs to AS'PAC'A( 2 <^(”) -h s(n°(d)). 

Before we go into the details of the proof of Thm. 4 let us first present an 
application. The existential theory of a finite monoid is decidable for trivial 
reasons. By Makanin’s result, the existential theory with constants of a free 
monoid is also decidable. Finally, by [19,21], also the existential theory with 
constants of a torsion-free hyperbolic group is decidable. Note that every free 
group is torsion-free hyperbolic. Since finite monoids, free monoids, and groups 
in general all satisfy Assumption 3 (and either Ua = V„ = Ih or Ua = Va = M„ 
for these monoids), we obtain the following corollary: 

Corollary 1. Let ¥ be a graph product of finite monoids, free monoids, and 
torsion-free hyperbolic groups, and let F be a finite generating set for P. Then 

3FOTh(P, (o)aer) is decidable. 

For the proof of Thm. 4 assume that 3FOTh(A4cr, C^) belongs to NSPACE(s(n)) 
for every a G S. Thus, the same holds for 3FOTh(Afcr,inVo.,CCT). This remains 
true if we put and M„ \ {lo-} into Ca- Then IRR G £: the language L = 
E* \ Uo-ei: 'S'*CT/i;(CT)*crA’* is regular. In order to define a C-automaton for IRR, 
we just have to replace in a finite automaton for L every label a by \ {la-}- 

We may also replace C by its closure under union; this does not change the 
class £ = C{C, I). Thus, the sets U, V, UU V, and every clan of {A, I) (which is 
a union of some of the M^.) belong to C. Then M({7, 1) (the domain of the lifting 
of inv to M(A, /)) belongs to £. 

Since by Assumption 3, inv : U ^ V is a, partial injection, we can define 
a partial involution t on A with domain U U V G C by t(a) = 6 if and only if 
either inv(a, b) or inv(5, a) (note that inv(a, b) and inv(6, c) implies a = c). This 
involution on A is compatible with I, hence it can be lifted to a partial involution 
6 on M(A, J) with domain M{U U V, I). 

Since 3FOTh(A4o-, inVcr,Ccr) belongs to NSPACE(s(n)) for every a G E, 
the same is true for 3FOTh(A, t, (L)igC: (o<r)<Tei:)- Thm. 2 (with (o^jo-ei: for 
{Rj)i<j<m) shows that the theory 3FOTh(M(A, I), i, £, (o^rjaei;) belongs to 
NSPACE(2‘^*^”) -I- 5 ( 71 *^^^^)) (note that the conditions (l)-(4) from Section 3.1 
are all satisfied in the present situation). 

Let 0 be a Boolean formula with atomic predicates of the form xy = z and 
X G L, where L G XC (atomic predicates of the form a; = 1 are not necessary since 

^ For a finite monoid note that aob — 1 implies that the mapping x 1 — >■ box is injective, 
hence it is surjective. Thus, there exists c with boc= 1. Clearly a = c, i.e., boa — 1, 
and invo- is a partial involution. 
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{1} G ZC). We have to check, whether there exists an assignment for the variables 
in 6 to elements in P that satisfies 6. For this, we transform 6 in polynomial time 
into an equivalent existential statement over (M(A, /), i, £, (o^)^g 2 ;). Thus, in 
some sense we isolate the structure of the Ma into the “Afo- local” o^r-predicates. 

First, we may push negations to the level of atomic subformulas in 0. We 
replace every negated equation xy ^ z hy xy = z' A z ^ z' , where z' is a new 
variable. Thus, we may assume that all negated predicates in 9 are of the form 
X ^ y and x ^ L for variables x and y. 

Recall from Section 2.2 that every a: G P has a unique representative iv{x) G 
IRR C M(A, I) and that xy = z in P if and only if w(a;)w(y) uj(z). Moreover, 
for L = L' D IRR G IC with L' G C we have a; G T if and only if uj{x) G L'. 
Hence, if we add for every variable x in 0 the constraint x G IRR (recall that 
IRR G C) and replace every equation xy = z in 9 hy the predicate xy A/j z, 
then we obtain a formula, which is satisfiable in the trace monoid M(H, I) if and 
only if the original formula 9 is satisfiable in P. Using the following lemma, we 
can replace the predicates xy -Ar z by ordinary equations plus o^r-predicates. 
For the proof of this lemma. Assumption 3 is essential. 

Lemma 5. There exists a fixed Boolean formula ■0(x, y,z,xi, . . . , Xm) over the 
structure (M(A, I), i, C, (oo.)crei:) such that for all x,y,z G IRR, xy -^r z if and 
only if (MI(A, Z), Zl, [= * * ■ dx^,., . y^ z, X\ 

We obtain an equivalent formula over (M(A, /), i, £, whose size in- 

creased by a constant factor. This concludes the proof of Thm. 4. 

4 Positive Theories of Graph Products 

In this section we consider positive theories. Let P = F{S,Is,{Ga)aes) be a 
graph product, where every Ga is a finitely generated group. Let G be a finite 
generating set for Ga- Then P is generated by T = Uctgi: G- A node ct G A is a 
cone if = A\ {ct}. Since we restrict to finitely generated groups, we obtain 
finite representations for recognizable constraints: If L = h~^{F) G REC(P), 
where : P — >■ Q is a homomorphism to a finite monoid Q and F Q Q, then 
L can be represented by h and F C Q. To represent h, it suffices to specify 
h{a) for every generator a G F. The next theorem is our main result for positive 
theories, its proof is similar to the proof of Corollary 18 in [5], see [6], and uses 
the Feferman-Vaught decomposition theorem [11]. 

Theorem 5. Assume that: (i) if a is a cone, then y>osTh.{G a , (a)aGr„ 7 REC(t/o-)) 
is decidable and (ii) if a is not a cone, then dFOTh(C/cr 7 (a)aGC,, 7 REC(t/CT)) is 
decidable. T/ien posTh(P, (a)aGC 7 HEC(P)) is decidable. 

The theory of Z with semilinear constraints (which include recognizable con- 
straints over Z) is decidable [18]. Since the same holds for finite groups, Thm. 5 
implies that for a graph product P of finite groups and free groups the theory 
posTh(P, (a)aGr,REC(P)) is decidable. This result was already shown in [5]. 
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Thm. 5 cannot be extended by allowing monoids for the groups Q^- Already 

the positive V3^-theory of the free monoid {a, 6}* is undecidable [10]. Similarly, 

Thm. 5 cannot be extended by replacing REC(P) by RAT(P), since the latter 

class contains a free monoid {a, 6}* in case P is the free group of rank 2. 
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Abstract. We introduce a random planted model of bi-categorical data to model 
the problem of collaborative filtering or categorical clustering. We adapt the ideas 
of an algorithm due to Condon and Karp [4] to develop a simple linear time algo- 
rithm to discover the underlying hidden structure of a graph generated according 
to the planted model with high probability. We also give applications to the prob- 
abilistic analysis of Latent Semantic Indexing (LSI) in the probabilistic corpus 
models introduced by Papadimitriou et al [12]. We carry out an experimental 
analysis that shows that the algorithm might work quite well in practice. 



1 Introduction 

Recommendation systems that are based on simultaneous clusterings across several cat- 
egories and an overlap of interests is called collaborative filtering since selection of items 
is done in a manner resembling individuals collaborating to make recommendations to 
their friends. Collaborative filtering methods have been applied in many areas both in 
research [14,11,2,10] and in industry [7,15]. In spite of a wealth of tools the problem 
remains challenging and much work remains to be done before a satisfactory solution 
can be found. In particular, besides the ever present need for fast and simple algorithms 
that work well in practical scenarios, perhaps as importantly, there is also a need for 
appropriate theoretical tools that can assist in the development of good algorithms. 

In this paper, we develop a natural model of categorical data based on the so-called 
planted partition models [8,4]. In our theoretical analysis, we restrict ourselves, as a 
first step, to two categories , called “people” and “movies”. Roughly, we have a set 
of “people” P partitioned into two unknown sets P\ and P 2 and likewise a set M of 
“movies” partitioned into unknown sets M\ and M 2 . While there are many (random) 
edges between Mi and Pj when i = j, there are very few (random) connections between 
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Mi and Pj when i ^ j. We seek an algorithm that, if fed with a bipartite graph of this 
type, is able to identify the unknown clusters Pi, P 2 and Mi, M 2 . 

In this paper we describe a very simple randomized algorithm to discover the hidden 
bi-categorical clustering structure in the quasi-random graphs generated according to 
this model (formally defined in § 2). We analyze the algorithm to show that it succeeds 
in discovering the underlying structure exactly with high probability. The running time 
of the algorithm is linear. 

While in the theoretical analysis we restrict ourselves to the 2 by 2 case, in our 
experiments we also test the algorithm in the khy k case, for fc = 2, 3, 4, 5 (k clusters on 
each side). The results are very promising and show that the algorithm might work well 
in practice. In particular, for k > 2, rather than applying the natural recursive approach, 
we exploit a concentration of measure phenomenon first observed in [4] and that seems 
to apply in our new context too. Namely, given the input graph, we compute a 2 by 2 
partition. Let Pi , P 2 and Mi , M 2 be the clusters output by the algorithm. To isolate the 
k clusters of type “people” for instance, we process the vertices one by one and group 
them in a greedy fashion according to the quantity 

hedges from the vertex to Mi — hedges from the vertex to M 2 

This random variable is sharply concentrated, so that we obtain k clumps from which the 
k true clusters can be easily obtained. The same process can be appied on the movie side. 
We remark that while concentration of measure is proven in [4], this is not done here for 
our bipartite scenario. The experiments seem to indicate however that the phenomenon 
could very well hold. 

Our quasi-random models are admittedly not entirely realistic, but we believe they 
form a very relevant test-bed for solutions because (a) they capture a very important 
aspect of the problem, namely the bi-categorical clustering and (b) they admit much 
flexibility and variations towards more realistic models. 

The work in this paper answers the open question posed in [4] whether their ideas can 
be used to address the collaborative filtering or categorical clustering problem. While 
very similar to their algorithm, ours incorporates a few modifications relevant to the 
bicategorical case. The analysis of the algorithm also follows closely the analysis of 
[4], but with one significant new ingredient. While the analysis of [4] can be viewed as 
a discrete probabilistic analogue of a differential equation that governs the underlying 
stochastic process, our analysis is a discrete probabilistic analogue of a coupled system 
of two differential equations. 

In the last section of the paper, we also show that similar ideas are also relevant 
in information retrieval in the context of probabilistic models and analysis of Latent 
Semantic Indexing (LSI), the foundations of which are laid out in [12]. Our analysis 
indicates that in these cases where traditional spectral methods for LSI [12,1] works 
well, one might achieve even better results more efficiently by applying very simple 
probabilistic algorithms. We show that our collaborative Altering or categorical clustering 
algorithm can outperform traditional LSI in these probabilistic corpus models' . 

* Preliminary experimental analysis suggests that while the quality of the clustering produced by 
spectral methods on the one hand and simple probabilistics schemes such as ours on the other 
are comparable, our probabilistic methods mn much faster. 
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1.1 Relation to Past Research 

Our work falls in the area of graph partitioning and clustering algorithms, both very well- 
studied topics with a large literature. Our work is most closely related to the papers of 
Jerrum and Sorkin [8] and Condon and Karp [4], The former explored how well the well- 
known simulated annealing heuristic (applied at constant temperature as the Metropolis 
algorithm) performs in the graph partitioning problem. They introduced the planted 
partition model of a random graph (discussed in in § 2) and showed that the Metropolis 
algorithm discovered this structure with high probability in time O(n^). Condon and 
Karp developed simpler algorithms for the problem using a stochastic process inspired 
by the Metropolis algorithm of Jerrum and Sorkin. They extended the planted partition 
model to one with several groups and showed that a simple randomized algorithm could 
discover the planted structure with high probability in linear time. 

There are two other papers that are closely related to our work. Ungar and Foster [ 1 6] 
also develop models of random graphs that are similar to our planted partition models. 
They adopt a statistical approach and use a Gibbs sampling algorithm similar to the 
well-known EM algorithm to estimate the underlying hidden parameters. Our algorithm 
is much simpler than theirs and also provably more efficient. While [16] can only invoke 
a general convergence result in the limit, and give no guarantees on the running time, 
we prove that our algorithm succeeds in finding the exact partition in linear time with 
high probability. 

Gibson, Kleinberg and Raghavan [6] address the general categorical clustering prob- 
lem using a completely different approach. They describe an iterative weighting algo- 
rithm that can be viewed as a higher order dynamical system. Because much is unknown 
about the properties of higher-order dynamical systems - in particular their convergence 
rates and even the limit - they are able to prove only some weak and limited results. 
They give some convergence results in these special cases, but even here, they are not 
able to give any guarantees on run times. They report some experimental studies on 
quasi-random data similar to ours, but give no theoretical analysis of the results. 

One of the recent approaches to the problem is [6] and our work offers an alternative 
approach based on completely different methods. It indicates that simple and efficient 
stochastic algorithms can be very effective in the context of categorical clustering in 
practice. Theoretically, our approach is demonstrably superior because in contrast to 
[6], we are able to prove rigorous and quite general results about the quality and the 
efficiency of the algorithms. 



2 Planted Models 

The so-called planted partition models [8,4] are models of random graphs that are the 
natural setting for the study of partitioning algorithms. In the planted bisection model, 
one has a bipartition with n/2 nodes on each side (with even n), which we can refer 
to as blue and green vertices. An edge is placed between like-colored vertices with 
probability p and between differently-colored vertices with probability r < p. The 
reason for focussing on this model is that ifp — r > for any fixed e > 0, then, 

with probability 1 — exp(— the planted bisection is the unique bisection with 
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minimum cut size. Hence we can demand of a bisection algorithm that given a random 
graph generated in this model (without revealing the planted bisection, of course), the 
algorithm must zero in on this unique minimum cut. 

A natural extension [4] is the planted ^-partition model, where we have i groups 
oinjl vertices each (with I dividing n), each group corresponding to one of i distinct 
colors. As before, an edge is placed between like-colored vertices with probability p 
and between differently-colored vertices with probability r < p. One can show that if 
p — r > for any fixed e > 0, then, with probability 1 — exp(— the 

planted partition by color classes as above is the unique multipartition (into i classes) 
with minimum multi-cut size. Hence we can demand of a partitioning algorithm that 
given a random graph generated in this model (without revealing the planted partition, 
of course), the algorithm must zero in on this unique minimum multicut. 

There is a natural extension of these models to our scenario with two categories of 
data, which we will think of as people (P) and movies (M). We will refer to these as 
bicategorical planted models. In the bisection model, we have n people and m movies, 
both divided into two equi-sized classes corresponding to two colors (so both n, m are 
even). In the ^-partition bicategorical model, both people and movies are divided into 
£ equi-sized classes corresponding to the colors In both cases, an edge is placed 
between a person and a movie of the same color with probability p and between a person 
and movie of different colors with probability r < p. Denote the resulting random 
bipartite graph by {P.M, E). 

We will assume that n > m. In both the (bicategorical) bisection and mutipartition 
case, it can be shown that if p — r > for any fixed e > 0, then, with probability 

1 — exp(— ), the planted (bicategorical) partition is the unique one which minimizes 
the number of non-monochromatic edges. We will assume henceforth that A := p—r > 
for some fixed e > 0. 

3 The Algorithm 

In this section, we present our linear-time algorithm for the collaborative hltering prob- 
lem with two categories (“people” and “movies”) and two classes or groups in each 
category. (In § 5 we indicate how the algorithm can be extended to several classes in 
each category.) 

The algorithm runs in four phases. The purpose of the first two phases is to build 
a statistically accurate partition of the smaller category (“movies”) i.e. a partition in 
which the two parts of the partition are almost monochromatic: almost all vertices are 
of the same color with a few vertices of the opposite color. The third phase will use 
this statistically accurate partition to construct an exact partition of the other category 
(“people”). Finally, the fourth phase will use the exact partition of the first category 
(“people”) to construct an exact partition of the second category (“movies”). 

Phase 1. In this phase, the algorithm will incrementally and simultaneously build a 
partition of both categories: (Tf , ) will be a partition of “people” and {L^ , R^) 

will be a partition of “movies”. Initially, both partitions are empty, and they are 
built up by interleaving the addition of two “people” vertices and two “movie” 
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vertices into the two partitions. In each of ti := steps, choose a pair of 

“people” nodes {pi,P 2 ) and a pair of “movie” nodes (mi .m 2 ), picked randomly 
and uniformly from the unexamined nodes. To classify the “people” nodes, we 
examine the number of edges from pi and p 2 to the partition of the “movies”. Let 
be the number of edges from vertex pi to and Rf^ respectively, for 
1=1,2. Let X := — r[ — £2 + ■ If X >0, place pi in Lf andp 2 in R 2 and 

if X <0, place them in the opposite manner. If X = 0, place them equiprobably 
into Lf and . Similarly use the “peoples” partition to place the movie nodes in 
the appropriate parts of the “movies” partition. 

Phase 2. In this step, we build a partition , R^) of “movies”. Initially this is empty. 

Choose m/4 new pairs of “movie” vertices chosen uniformly at random from the 
unexamined vertices and assign them greedily to or R^ using the “peoples” 
partition (Lf , i?f ) as in the first phase. Note that all pairs may be processed con- 
currently. 

Phase 3. In this phase, we build the final partition of the “people”, (L^,R^). Once 
again, this is built up starting from the empty partition. We repeatedly pick a “people” 
node from all the “people” vertices (including the ones processed in the first phase) 
and assign it greedily using the “movies” partition (L^, R^)' it is added to if 
it has more edges to than to R^ . 

Phase 4. Finally we build the final partition of the “movies”, (L^, R^). This is built 
up starting from the empty partition. We repeatedly pick a “movie” node from all 
the “movie” nodes (including those processed in the earlier phases) and assign it 
greedily using the “peoples” partition {L^ , R^) symmetrically as in Phase 3. 



4 Analysis 

4.1 Overview 

The progress of the algorithm is measured as in [8,4] by tracking the color imbalance of 
the partitions. The color imbalance of a partition is one half of the number of vertices of 
one color in one half of the partition minus the number of vertices of the same color in the 
other half. In our case, there are two color imbalances to track: , the color imbalance 

of the “peoples” partition and , the color imbalance of the “movies” partition. The 
analysis proceeds by showing that each of the following statements are true with high 
probability, by which we mean throughout, probability 1 — exp(— 

- At the end of Phase 1, both color imbalances k^ , k^ are at least 

- At the end of Phase 2, the color imbalance k^ is 0{m). 

- At the end of Phase 3, no node in has the same color as a node in R^ i.e. 
(L^, R^) is a perfect partition of the “people”. 

- At the end of Phase 4, no node in has the same color as a node in R^ i.e. 
(L^, R^) is a perfect partition of the “movies”. 

4.2 A Coupling Argument 

The analysis of the evolution of the color imbalances k^ , k^ is complicated by the fact 
the steps in the underlying stochastic processes are not independent and the transition 




Analysis and Experimental Evaluation of a Simple Algorithm 



173 



probabilities depend on the history of the process. To overcome this difficulty, we em- 
ploy the coupling argument from [4] [Lemma 4] or [8] [Theorem 6.2] which relates the 
behavior of the process to a simple random walk with independent increments. Thus we 
can focus on analyzing the evolution of and as a simple random walk on the line. 

4.3 Phase 1 

At the end of Phase 1, we claim that both the imbalances k^ and k^ are at least 
with high probability. The proof analyzes how the imbalances grow with time. The 
following claim is the key to the analysis and is very similar to Claim 2 in [4]. It shows 
that 

(b) At every iteration of the greedy partition building process, both imbalances are at 
least as likely to increase as to decrease. 

(c) The two imbalances mutually reinforce each other: the higher k^ is, the more 
likely is k^ to increase and vice-versa. This is a key additional insight over the 
Condon-Karp analysis. 

Proposition 1. Let k^ = k^{t) and k^ = k^ (t) denote the imbalances in the parti- 
tions (Cf, Ri) and Lf^ , Rf^) respectively at time step t of Phase 1. Let k denote either 
k^ or k^ . Then, at the next time step, for any history of the process till time t, 

(a) Pr[fc increases] = f2(l). 

(b) Pr[fc increases] — Vv[k decreases] > 0. 

(c) Ifk^,k^ = then 

Pr[fc^ increases] — Vr[k^ decreases] = n{Tam{k^ A/ 

and 

Pr[/c^ increases] — Pr[fc^ decreases] = l7(min(fc^Z\/-\/t, 1)), 

The proof of this proposition is exactly the same as the proof of Claim 2 in [4] 
using the quantitative version of the Central Limit Theorem called Esseen’s inequality 
[13][Theorem3,p.lll]. 

The analysis of Phase 1 is now completed by dividing the evolution into two parts: 

- Inthefirstpart, theevolutionofbothfc^ and/c^ can be treated as an unbiased random 

walk by Proposition 1, part (b). Thus from standard results on the unbiased random 
walk [5][XIV.3], the expected time for the random walks to reach starting 

at 0 is TO Applying Markov’s inequality, the walk reaches in 

steps, and hence the probability that both walks reach to^/^“^/^ in steps 

(not necessarily at the exact same moment) is 1 — exp(— 17 (to'^/^)). 

- In the second part, both random walks now have a distinct drift towards increasing: 

the difference between the probability of an increase and a decrease for the random 
walk of when the random walk for k^ is at position i is by Proposition 1 , part (c), 

5 = f2(min(iZ\/v/f, 1)) = fi{TaiYi{iA/ 1)). Similarly for k^ . By a standard 




174 



D. Dubhashi, L. Laura, and A. Panconesi 



result on biased random walks [5][XIV.3], the expected time for each of 
to double is 0{i/S) = and so with high probability, it doubles in time 

steps. Thus after ti = steps, both random walks reach 

with high probability. A similar analysis shows that once this imbalance is reached, 
it remains at least half this value with high probability. 

4.4 Phase 2 

In this phase, only the “movies” partition evolves. Each pair of nodes (mi,TO 2 ) con- 
tributes 1 /2, 0 or —1/2 to This contribution is determined by the value of k^ , which, 

from the analysis of Phase 1 is at least rn}~^. Now we use Claim 7 from [4]: 

Proposition 2. Suppose the imbalance k^ at the end of Phase 1 is at least rn}~'^. Let 
(mi, m 2 ) be a pair of “movie” vertices examined in Phase 2. Then 

Pr[(mi,m 2 ) increases — Pr[(mi, m 2 ) decreases k^] = L2{l). 

Thus the value of k^ at the end of Phase 2 dominates X)i<i<m /4 where the Zi 
are independent random variables taking values in the set {1/2, 0, —1/2} with Pr[Zi = 
1/2] — Pr[Zi = —1/2] = 17(1) for each i. Thus k^ = I7(m) with probability at least 
1 — exp(— I7(m)) at the end of Phase 2. 

4.5 Phase 3 

This phase gives rise to an exact partition of the “people” using the statistically-accurate 
partition of the “movies” produced in Phase 2. For a people’s node p, let £2 (p) denote the 
number of edges from p to . As in Claim 9 of [4], it follows by a simple application 
of the concentration of measure that with high probability, 

1^2 (p) - E[£2{p)] \ < 

Next, for vertices p and p' of opposite colors, a simple calculation shows that 

\E[£2{p)] - E[i2{p')]\ = k^A = f?(m-i/"+^). 

Thus the values of £2 (p) are sharply clustered around these two widely separated values, 
and so with high probability, all “people” nodes are correctly partitioned in Phase 3. 

4.6 Phase 4 

The analysis of this phase is exactly the same as the analysis of Phase 3, with the roles of 
“people” and “movies” reversed (the additional fact that the imbalance k^ now equals 
the maximum possible possible value, n/2 only helps). Thus with high probability, the 
“movie” nodes are correctly partitioned in this phase. 

Theorem 1. Aj long as p — r > for some constant e > 0, the algorithm 

applied to a graph generated by the bi-categorical planted model with two classes in 
each category, discovers the underlying hidden planted structure exactly with probability 
1 — exp(— m^*^"^^). 
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5 Extensions 

5.1 Multi-planted Models 

The algorithm and analysis can be extended to the more general case of the multi-planted 
models where each category is divided into f > 2 color classes as in [4]. In the recursive 
algorithm, we first split both categories into two parts as above, and then recurse on the 
two sub-parts. In the non-recursive version of the algorithm, we modify Phase 3 above. 
The statistically accurate partition {L^ ,R^) built in Phase 2 has the the additional 
property that the various color imbalances are well- separated from each other. This can 
be used to produce an exact partition of the “people” nodes by examining the number 
of edges to one half of the partition as in [4] . In our experiments below we exploit this 
fact. 

5.2 Noisy Data Models 

One can define a model of quasi-random planted data that also incorporates noise: in 
addition to the edges defined in the planted model, we also introduce random edges 
between “people” and “movies” placing an edge between a pair of people-movie nodes 
independently with probability 6. As long as the noise parameter S is small, specifically, 

6 = o{p — r), the algorithm is unaffected and continues to work correctly. In this sense 
our algorithm is robust to noisy data. 

5.3 Unequal Class Sizes 

It is fairly clear from the description and analysis above that the algorithm and its analysis 
are fairly robust. In particular they do not really depend on each color class in the planted 
model being of the same size. The algorithm and analysis will continue to hold as long 
as the ratio of the largest group to the smallest group is not too large. While we defer the 
full analysis to the final version, here we give some good empirical evidence indicating 
that the algorithm performs well in this case too. 

6 Experimental Results 

The input graphs for our experiments are random bipartite graphs of the form G = 
{P, M, E) that we shall now describe. 

- P is the set of "people", P = Pi U P 2 , where Pi is the subset of P of color 1 and 
P 2 is the subset of P of color 2 

- M is the set of movies, M = Mi U M 2 , where Mi is the subset of M of color 1 
and M 2 is the subset of M of color 2 

- |P| = n, |M| = m, and |Pi| = IP 2 I = f , |Mi| = IM 2 I = f . 

- p is the probability of like-colored edges, that is the probability of an edge between 
elements of Pi and Mi and elements of P 2 and M 2 . 

- r is the probability of different-colored edges, that is the probability of an edge 
between elements of Pi and M 2 and elements of P 2 and Mi . 
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- The output of the algorithm are four sets such that P = U R^ 

wdM = L^UR^. 



Let us assume, without loss of generality, that color 1 is the dominating color in the set 
(if it is not, we can swap with R^ and with R^). To measure the quality of 

the result we shall make use of the following quantities: 

- if is the percentile presence of color 1 in the set L^, or, more formally, 



Lf = 



\LP\ 



100 . 



Similarly, 



and 



and 






M ii"nA/,| 



Lf = 



|LM| 



100 , 



^ \R^nM,\ 

- |^M| • 100- 



- With n we denote the average of the previous quantities, i.e., fj, = 



4 



Thus, the 4 parameters {m, n, p, r} hx a specific distribution of input bipartite graphs. 

In our experiments we are interested in speed and accuracy, for the 2 by 2 as well as 
the khy k case. 

Our experimental results for the accuracy in the 2 by 2 are summarized in Fig.l 
through 3. Every “point” in the plots is the average taken over 100 different runs of the 
algorithm: we generated 10 bipartite graphs and for each of them we ran the algorithm 
10 times. Unless stated otherwise, these are the values for the parameters: n = 100, m = 
80,p= 0.7,r = 0.1. 

Fig. 1 shows the performance of the algorithm against p — r, the probability of like- 
colored edges minus the probability of differently-colored edges. Specifically r is kept 
hxed at 0.1, while p increases from 0.3 to 0.9. On the y-axis we report the “quality” 
of the partitioning made by the algorithm. For each value of p — r there are 4 plots, 
corresponding to Pf , and R^ . Notice that these value quickly rise from 60% 

(mediocre accuracy) to over 90% (excellent accuracy), asp — r increases. 



In Fig. 2 we keep the size of the movie set hxed and increase the size of the people set. 
In this way we get an indication of performance of the algorithm when the relative sizes 
differ. Here |P| = n = am = a|M|, m = 100 is kept hxed while a = 1, 2, . . . , 10. 
Here, p — r = 0.6. On the x-axis we have a while, as before, on the y-axis we have if, 
and . 

As it can be seen the algorithm is fairly robust and its performance does not depend 
on a and hence on the relative sizes of the sides of the bipartition. 
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p-r 



Fig. 1. Accuracy versus p — r. 



Finally in Fig. 3 we analyze “scalability”. That is, we blow the sizes of both sides of 
the bipartition. Here, \P\ = n = fin* and \M\ = m = fim*, where n* = 100, m* = 80 
and P goes from 1 to 10. As with the previous test, and consistently with the analysis, 
the quality of the result doesn’t depend on the scaling factor /3: all the percentiles are 
over 90% that means that the size of the sets does not influence the quality of the results 
(provided that they are big enough). We plot Lf , R^, and versus p. 

Let us now switch to the khy k case. Recall that the algorithm proceeds as follows. We 
first compute a 2 by 2 partition Mi , M2 and Pi, P2. The algorithm knows k, the number 
of clusters on each side. To partition the set of people into k clusters we compute for 
each vertex the quantity 

hedges from the vertex toMi — hedges from the vertex toM 2 . 

For a given true cluster Pi, these quantities happen to be highly concentrated and, for dif- 
ferent Pi and Pj, quite spread apart. It is therefore easy to see k “clumps” corresponding 
to the k clusters. The same approachy is used to cluster the movies. 

Our results are presented in Table 1. The parameters have the following values: 
n = m = 100, 000 nodes on each side of the bipartition, and r = 0.1. The table shows 
the results for different values of p, the probability of like coloured edges. 

The algorithms displays the following curious behaviour. To fix the ideas, consider 
the case k = A. Table 1 forp = 0.7 reports the following figures: 



(1,1, 1,1) =4% 
( 2 , 1 , 1 , 0 ) = 12 % 
(2, 2, 0,0) = 84%. 
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Fig. 2. Accuracy versus relative sizes. 



Table 1. Results for fc > 2 clusters. 





fe = 3 


fc = 4 


k — 5 


p = 0.7 


(1,1,1) = 83% 
(2,1,0) = 17% 


(1,1, 1,1) =4% 
(2, 1,1,0) = 12% 
(2, 2, 0,0) = 84% 


(1,1,1,1,1) = 0% 
(2,1,1,1,0) = 0% 
(2,2, 1,0,0) = 70% 
(3, 2, 0,0,0) = 30% 


p = 0.8 


(1,1,1) = 74% 
(2, 1,0) = 26% 


(1,1, 1,1) = 3% 
(2, 1,1,0) = 17% 
(2, 2, 0,0) = 80% 


(1,1,1,1,1) = 0% 
(2,1,1,1,0) = 0% 
(2,2, 1,0,0) = 63% 
(3, 2, 0, 0, 0) = 37% 


p = 0.9 


(1,1,1) = 80% 
(2, 1,0) = 20% 


(1,1, 1,1) = 0% 
(2, 1,1,0) = 10% 
(2, 2, 0,0) = 90% 


(1,1,1,1,1) = 0% 
(2,1,1,1,0) = 0% 
(2,2, 1,0,0) = 84% 
(3, 2, 0,0,0) = 16% 



This means that only 4 per cent of the times the algorithm was able to produce the 4 
correct clusters. 12 per cent of the times the algorithm was able to produce 3 clusters, 
one of which containing 2 true clusters, and the remaining two containing one true 
cluster each. Finally, 84 per cent of the times, the algorithm produces just 2 clusters, 
each containing 2 true clusters together. Strange as it may seem first this does not imply 
that the algorithm is working poorly ! 

First, the algorithm might clump true clusters together in one bigger cluster, but it 
never splits a true cluster into two or more. Second, the algorithm works very well in 
the 2 by 2 and 3 by 3 case (see table), so if bigger clusters are produced they can be 
separated by running the algorithm again. 
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Fig. 3. Scalability. Here the size of the people and movie sets are n = fin* and m = fim* , and 
n*,m* fixed. 



Finally, we come to the running time. The results are shown in Figure 4, reporting an 
implementation with external memory due to the very large sizes we tested the algorithm 
with. As it can be seen the algorithm is quite fast. 

In what follows we describe some of the implementation issues. Our impementation 
is an example so-called semi-external graph algorithms, i.e., algorithms that store infor- 
mation about the nodes in main memory and keep the edges in secondary memory. We 
decided to have a redundant representation of the graph in secondary memory, and we 
store twice the edges list, one sorted primarily by people and secondarily by movies, and 
the other in the opposite way. Every edge list is represented in secondary memory with 
three arrays: one (the main) contains the list of the pointed nodes, and the other two (the 
indexes) give, for every pointing node, the outdegree and the starting position of its edges 
in the first array. So, for the first edge list (sorted by people) we store 2n -f ^{edges) 
items, and, similarly, for the second list we use 2m -\- (edges) items. The total disk 
space is therefore 2n -I- 2m + 2 • ^ (edges) items, where every item stores a single node. 
This representation occupies only 2n -\- 2m more items if compared to the simple edge 
list (where we store 2 items for each edge), but in this way we can compare two sets in 
a linear time, and we use this property in all the phases of the algorithm (except the first 
one). We know that, in the Data Stream model, such a representation that divides the 
data could sound surprising, but the index arrays (the ones that store the outdegree and 
position of pointed nodes) are buffered and the streamed data is the main array. 

We also note that a graph generated according to the model we described has an 
expected number of edges equal to E = n ■ (p ■ ^ r ■ (k — 1) ■ ^). So, if m = n = 
100,000, fc = 4,p= O.Tandr = 0.1 we have more than 2 billion edges. We use 12 bytes 
per node in the index arrays (4 for the outdegree and 8 for the starting position in the 
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Fig. 4. Time performance of the semi-external memory algorithm. 



main array) plus 4 bytes for each node in the main array. So, with the above quantities, 
we use approximately 40 GB of hard disk space. 

7 Application: Probabilistic Analysis of LSI 

Papadimitriou etal{\2\ took a first step in developing a probabilistic model and analysis 
to explain the effectiveness of LSI in diverse applications. They introduced a probabilistic 
corpus model of generating data. There is an underlying universe of all terms, U. A topic 
is a probability distribution on U. A meaningful topic is very different from the uniform 
distribution on U ; usually it will be concentrated on a small set of terms that are highly 
relevant to a particular subject. A corpus is said to be e-separable (for 0 < e < 1) if each 
topic T in the corpus is associated with a subset of terms Ut, called the primary terms 
of T such that Ut (T Ut' = 0 for different topics T, T' and each topic T concentrates 
probability at least 1 — e on Ut- A pure corpus model is a probability distribution T> 
on pairs (T, t) where T is a topic and £ is a positive integer. The corpus model will be 
called uniform if 

- For any two topics T, T', the set of primary terms has the same size: \Ut\ = \Ut>\- 

- The distribution T> is uniformly distributed on all topics. 

A document is generated form such a corpus by a two-stage process: first a pair (T, £) 
is picked according to T>. Then terms are sampled £ times according to the distribution 
T. A corpus of size n is formed by repeating this two-stage sampling process n times to 
form n documents. 

Let us look at the random bipartite graph between documents and terms generated 
by such a process of data generation. It is clear that the resulting graph is very similar to 
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a planted partition model, with partitions corresponding to topics on the topic side and 
to documents belonging to a particular topic on the document side. The only difference 
is that not all edges from a document to a term within its primary set of terms have the 
same probability. However, a similar analysis shows that the same algorithm succeeds 
with high probability in recovering the correct partitions. 

Theorem 2. The collaborative filtering algorithm applied to a pure, uniform e-separable 
corpus model will generate the correct bi-categorical partition of documents and terms 
with very high probbaility, as long as \ — 2e> (In the terminology of [12] it 

will be 0-skewed.) 

It seems likely that our methods will work well for the corpus model in [12] and also 
more realistic generalizations of it. We hope to develop this in future work. 
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Abstract. Gomputational genomics involves comparing sequences 
based on “similarity” for detecting evolutionary and functional relation- 
ships. Until very recently, available portions of the human genome se- 
quence (and that of other species) were fairly short and sparse. Most 
sequencing effort was focused on genes and other short units; similarity 
between such sequences was measured based on character level differ- 
ences. However with the advent of whole genome sequencing technology 
there is emerging consensus that the measure of similarity between long 
genome sequences must capture the rearrangements of large segments 
found in abundance in the human genome. 

In this paper, we abstract the general problem of computing sequence 
similarity in the presence of segment rearrangements. This problem is 
closely related to computing the smallest grammar for a string or the 
block edit distance between two strings. Our problem, like these other 
problems, is NP hard. Our main result here is a simple 0(1) factor ap- 
proximation algorithm for this problem. In contrast, best known approxi- 
mations for the related problems are factor 17(logn) off from the optimal. 
Our algorithm works in linear time, and in one pass. In proving our re- 
sult, we relate sequence similarity measures based on different segment 
rearrangements, to each other, tight up to constant factors. 



1 Introduction 

Similarity comparison between biomolecular sequences play an important role 
in computational genomics due to the premise that sequence similarity usually 
indicates evolutionary and functional similarity. For example, popular computa- 
tional tools for both multiple sequence alignment and evolutionary tree construc- 
tion are based on iteratively measuring the similarity between pairs of available 
sequences. 

* Funda Ergun’s research supported in part by NSF grant CGR 0311548. Muthukris- 
hnan’s research supported in part by NSF EIA 0087022, NSF ITR 0220280 and NSF 
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With recent advancements in genome sequencing technologies we now have 
access to complete genome sequences of humans and many other species [37, 
23]. As a result, the notion of similarity between genome sequences has changed 
dramatically in the last three years. In particular, due to the availability of short 
fragments only, previous notions of similarity between genome sequences focused 
on edit operations mimicking “point mutations”, i.e., single character replace- 
ments, insertions and deletions only. Until recently precious little was known 
about large range/scale evolutionary mechanisms for genome modification. With 
a much improved understanding of the human genome sequence composition, ge- 
nomicists are now studying genome-wide structural modifications. (See [12] for 
some of the recent topics of interest.) There is emerging consensus that in or- 
der to capture evolutionary mechanisms underlying genome evolution one must 
consider a richer set of sequence modifications than single character edits. This 
has lead to a new breed of sequence comparison methods that capture similarity 
based on not merely individual nucleotides (i.e. characters) but involving longer 
“segments” of the genome. 

This paper formalizes the sequence similarity problem in the presence of seg- 
ment rearrangements, based on insights from half a dozen biological studies over 
the past three years (e.g. [25,4,26,5]). This problem is related to the study of 
string grammars and certain “block” edit distances, but it is still quite distinct 
from these and other string comparison problems in the literature. Known meth- 
ods for computing sequence similarity in the presence of segment rearrangements 
rely on greedy heuristics, but provide no provable guarantees. Our main result 
here is a simple and efficient greedy algorithm that provably approximates the 
measure of similarity defined, up to an 0(1) factor. In contrast, the best known 
approximation ratios on related sequence comparison problems is I7(logn). Our 
proof method connects different models for transforming a sequence into another 
via segment rearrangements and shows them to be equivalent modulo constant 
factor approximations. Furthermore, our algorithm works in one pass, with sub- 
linear work space - which is desirable for handling massive sequences such as 
whole genomes - trading off space for accuracy. See [13] for background in ge- 
nomics; here, we present our formal problem and our results. 



1.1 Algorithmics: The Formal Problem 

The focus of this paper is computing the distance between two strings in terms 
of edit operations that transform one string into the other. The edit operations 
of interest are 

— character edits: character insertions, deletions and replacements, and 

— segment edits: substring relocations, deletions, and duplications. 

We define the distance from a given string R into another S, denoted d{R — >■ S), 
as the minimum number of permitted edit operations to transform R into S. The 
edit operations may overlap arbitrarily. Note that this distance is not symmetric 
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since arbitrary deletions are allowed: simply consider the case where R is the 
empty string and S is of length at least 2d 

The sequence comparison problem is to compute (or estimate) d{R — >■ S'). If 
only character edits (inserts, deletes and changes) are allowed, then d{R — >■ S) = 
d{S — >■ R) becomes the well known Levenshtein edit distance [22]. Levenshtein 
edit distance [22] or its weighted/normalized versions have been used for decades 
for measuring the functional and evolutionary similarity of genome sequences [31, 
1]. Our focus is on the general sequence comparison problem in presence of both 
character as well as segment edits. 



1.2 Previous Work 

Previously, distances permitting segmental rearrangements have been investi- 
gated in a limited context. The bulk of the work is on distances that allow 
relocations or reversals only [2,3,15,16,8,9]. In this context each “character” rep- 
resents a long segment of the genome, most commonly a whole gene and thus 
occurs only once in the sequence, effectively making the sequence a permuta- 
tion of such characters. Thus these problems are reduced into special sorting 
problems. 

Distances permitting a larger spectrum of edit operations have been inves- 
tigated only recently: Examples include the transformation distance of Varre 
et. al. [38] and its closely related companion, the compression distance of Li et. 
al. [25,4,26]. These works received recent attention from the general scientific 
community [5] because of their demonstrated success in capturing the evolution- 
ary relationships between world languages demonstrated by the comparison of 
“declaration of human rights” in various languages [4], DNA and RNA sequences, 
in particular, the known mitochondrial DNA sequences of various species [25] 
and families of retrotransposons [38], as well as the identification of author- 
ship for works of literature [4] . These papers upper bound sequence “similarity” 
by a number of heuristics inspired by data compression methods [38,25,4,26]; 
however, the fundamental drawback of these works is that they do not provide 
any analysis of how well these heuristics measure the similarity with segment 
rearrangements. 

Our sequence comparison problem is related to two problems that have at- 
tracted recent attention. 

The first one is the smallest grammar problem which is to find the smallest 
context-free grammar that exactly generates the given string a. This problem can 
be thought of as a sequence comparison problem between (j) the empty string 
and cr, where the transformations are defined by the production rules of the 
grammar. However, our problem is more general than what can be captured by 
a set of production rules in context-free languages. This is primarily because we 
allow deletions. 

^ It is possible to obtain a symmetric version of a given distance via any commutative 
operation on d{R — >■ S) and d{S — >■ R); e.g., one can take their mean, minimum or 
maximum. 
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For example a = aca, aba, aca can be obtained from (j) by at most 6 edit 
operations: (f> ^ b ^ ba ^ aba — >■ aba, aba — >■ aca, aba — >■ a. However the 
smallest grammar has cost 9: ct — >■ ABA B — >■ aba, C — >■ aca. The cost of a 
grammar, as is standard, is the total number of symbols in the right hand side 
of all the production rules. 

The best known approximation for the smallest grammar problem is within 
factor O(logn) [7]. Any significant improvement to this approximation would 
require progress on a longstanding algebraic problem [24]. 

Second, there is a body of algorithms for estimating the block edit distance 
between two sequences which is quite similar to the segment rearrangement 
distance. The key difference is in the segment rearrangement operations that are 
allowed: in block edit distances, segment deletes are either not allowed at all [11] 
or are allowed only in a restricted way [10,29]. In [10,29], more specifically, any 
deletion can be performed only on a segment that has another copy elsewhere 
in the reminder of the sequence: the intuition is that such a delete operation is 
an inverse of a segment copy edit. Under this definition, the block edit distance 
between a long sequence a and the empty sequence 4> may be quite large. On 
the other hand the segment rearrangement distance between any cr and (j) is 
at most one. Other block edit distances such as the ones in [30,11,34] do not 
allow segment copies. There does not seem to be any biological justification for 
assuming such restrictions in segment rearrangements: both copies and arbitrary 
deletes have important roles. [38,25,4,26]. Therefore, we focus on the unrestricted 
segment rearrangement problem here. 

Block edit distances can be approximated by embedding strings to vector 
spaces and thereby reducing the segment rearrangement distances to vector 
nroms: those methods provably do not work for the segment rearrangement 
problem. All known block edit distances are hard to compute exactly, and the 
best known algorithms yield an 12(lognlog* n) approximation at best. 

1.3 Our Results 

Our main result is a linear time, greedy algorithm that we prove approximates 
the segment rearrangement distance between R and S to within a constant factor. 

This is the first known, provably approximate result for this problem. In 
contrast to our result, approximation factors for related problems such as the 
smallest grammar problem and block edit distance problems are at best 12(log n). 
Our approximation algorithm is quite simple to understand and implement, and 
is likely to find use in practice. 

We modify our result so that it can work in one-pass, utilizing small 
workspace; in particular, we show that our algorithm can be made to work in 
0{k-\- n/k) workspace with an 0{k) factor increase in the approximation factor. 

This is in the spirit of a number of recent results for processing massive data 
sets in the streaming model where the goal is to perform computations in one 
pass, with as small a workspace as possible (see the survey in [28]). Our result 
falls in this category. In computational genomics, it has long been the case that 
workspace proves to be a bottleneck in comparing sequences. For example, in 
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a classical result, Hirschberg [14] showed that the longest common subsequence 
problem can be implemented using only linear workspace; in contrast, other 
sequence comparison algorithms including the ones for standard edit distance 
computation use larger, even quadratic space, which is prohibitive. In the mas- 
sive genome context we study here, workspace becomes even more of a critical 
resource, and hence, our sublinear space results may prove not only useful in 
practice, but also necessary. 

Finally, our approach to solving this problem gives improved hounds for re- 
lated problems such as computing the block edit distances. 

For example, for a block edit distance problem, i.e., in the presence of non- 
overlapping character edits and block copies, relocations and restricted deletions 
on segments that have copies elsewhere, the best known previous algorithm 
had an approximation ratio 0(lognlog*n) [10,29]. We can get an 0(1) factor 
approximation for this problem as a byproduct of our main result. 



1.4 Technical Overview 

There are three different technical approaches to solving edit distance problems 
in general. Dynamic programming is useful for character edits, but typically does 
not apply for segment rearrangement distances. In [7] on the smallest grammar 
problem, Lempel-Ziv-77 compression method is used to parse a given sequence 
to construct a grammar in a tree-like manner. Our method here will also be a 
greedy parsing, but subtly different from the Lempel-Ziv-77 compression. Finally, 
in [10,29], sequences are parsed using a sophisticated algorithm to embed them 
into a high dimensional vector space. This approach provably does not work 
for estimating the segment rearrangement problem. In fact, our 0(1) factor 
approximation algorithm applies to sequence comparison, but not for embedding 
sequences in general. 

The technical crux of our work is the analysis of our greedy algorithm. The 
segment rearrangement distance from R to S restricts one to work on R and 
transform it to S via the smallest number of available segment and character edit 
operations. One can define two versions of this problem based on the “source” 
and the “destination” of segment copy operations allowed: the more restrictive, 
external version and the general, internal version. These are formalizations of 
the models implicitly assumed by [38] and [25,4,26] respectively. Enroute to our 
main result, we prove fairly tight relationships between such versions of segment 
rearrangement problems. 

In Section 2, we define the problem formally in detail (including the “exter- 
nal” and “internal” versions). In Section 3, we present our results for the external 
version of the problem which are quite simple, but crucial for our main results in 
Section 4 on the “internal” segment rearrangement problem which is the general 
problem. Our small workspace version of the algorithm can be found in [13]. 
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2 Approximating the Segment Rearrangement Distance 

Recall that the segment rearrangement distance d{R — >■ S) is the minimum 
number of character and segment edit operations needed to transform sequence 
R into S. Since we allow unrestricted segment edits, we can drop character edits 
from special consideration henceforth since any character edit is also a segment 
edit. Our problem is to compute or estimate this distance. 

Any transformation that uses edit operations can be trivially emulated by 
another transformation which constructs S from an initially empty string S' 
while using R only as a repository of substrings which can be copied into S' 
Henceforth, we will focus on such transformations only. Such a transformation 
is called internal if the repository also includes S' itself, i.e. segment copies 
(and thus segment relocations) within S' are allowed. A transformation is called 
external if the repository is set to be R throughout, i.e., any segment inserted 
anywhere in S' must be copied from R. In particular, in the external model, 
derived segments of S' that are not found in R can not be reused in a segment 
edit. 

Definition 1. We denote by bint{R S) the minimum number of segment edit 
operations (relocations, copies, and deletions) needed to transform R into S in 
the internal model. Similarly we denote by bext{R S) the minimum number 
of segment edit operations needed to transform R in to S in the external model. 

Both versions of the segment rearrangement distance above are NP-hard to 
compute as they reduce to the optimal dictionary selection problem [35]. 

Because of this computational hurdle, most biological work has resorted to 
using a simpler and computationally easier approach based on the Lempel-Ziv- 
77 data compression method. The Lempel-Ziv-77 method greedily partitions a 
string S into “phrases” from left to right such that each phrase S'[t : j] for j > i 
must occur previously in S. 

Definition 2. Denote by LZ{S) the number of Lempel-Ziv-77 phrases obtained 
in S. We define Cint{R S), the internal compression distance from R to S, 
as the number of phrases in a parsing of S which can occur not only earlier in 
S but also anywhere in R. We define Cext{R S), the external compression 
distance from R to S analogously as the number of greedy phrases in S each of 
which occurs as a substring in R only. 

If i? II S' denotes the concatenation of R and S separated by delimiter || which 
is not a part of the alphabet of S or R, then Cint{R — >■ S) = LZ{R\\S) — LZ{R). 

The Cint{R S) (resp. Cext{R — >■ S)) distance intuitively captures the follow- 
ing internal (resp. external) transformation from R to S: the only edit operations 
used are insertion of the longest possible segment from R, S' (resp. R only), or 
a single character to the right end of (initially empty) S' such that S' continues 
to be a prefix of S. 

^ To maintain the invariant that R is never changed, the emulation, if necessary, can 
start off by copying A as a whole into S' . 
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It is important to note that both Cext{R — >■ S) and Cint{R S) are com- 
putable in 0{\R\ + [S'!) time via the use of (on-line) suffix tree construction al- 
gorithms [32]. Thus they provide easy to compute upper bounds to bext{R S) 
and bint{R S) respectively. The main result of our paper is that these up- 
per bounds are quite tight, i.e., they also provide lower bounds up to constant 
factors. 

2.1 External Transformations 

We first focus on the external segment rearrangement distance bext{R — >■ S). 

Theorem 1. The external compression distance provides a 4 approximation to 
the external segment rearrangement distance, i.e., bext{R S) < Cext{R — >■ 
>5') < 4 • bext{R S)- Therefore, there exists a linear time, 4 approximation 
algorithm for external segment rearrangement distance computation. 

The proof is achieved through the following three claims. 

Claim. Let eext{R S) be the distance based on the external transformation 
that allows only character insertions and segment copies. Then eext{R S) 
approximates bext{R S) within a factor of 2, i.e. bext{R S) < e^xtiR 
S) < 2 ■ bext{R S). 

To see why the above claim holds, consider the deletion of a segment B 
in S. Segment B must have been created in S' through one or more segment 
copies and possibly other operations. These operations as well as the deletion 
of B can be emulated by only two segment copy operations: one involving the 
segment on the right boundary of B and the other involving the segment on the 
left boundary of B, excluding their characters that intersect with B. Thus the 
number of operations is preserved for this case. 

Now consider replacement of a character c in S' by another character. This 
can be emulated by a character insertion followed by a character deletion (which 
in turn can be emulated without changing the number of operations as shown 
above). The claim follows as segment relocations are not allowed in the external 
model. 

Claim. Let gext{R S) be the distance which is defined in terms of the external 
transformation that allows segment copy and character insert operations only at 
the right and left boundaries of S. Then Cext{R — >■ S) = gext{R R)- 

The correctness of the claim can be shown as follows. Let E = 
e(l), e(2), . . . , e{k) be the smallest sequence of segment copy and character in- 
sert operations in the external model applied to both boundaries of S', i.e. 
gext{R S) = k. Among these edit operations let El = 6^(1), cl( 2), . . . , eL(fci) 
be those which are applied on the left boundary of S', maintaining the orig- 
inal order in E. Similarly, let En = en{l),e[i{2) ... ,eR{kfi) be those oper- 
ations which are applied to the right boundary of S', maintaining the orig- 
inal order in E. Obviously k = kji + kr. Since the two sequences of oper- 
ations Efi and El do not interfere with each other any interleaving of the 
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two will yield the same string S. Then applying the interleaving sequence 
E' = eL{kL),eL{kL - 1), 6^(1), 6^(1), 6^(2), efl(fcK) all to the right 
boundary of (the initially empty) S' constructs S as well. Such a transformation 
simply allows segment copies and character inserts only to the right boundary 
of S' . To minimize the number of such edit operations, the greedy choice for the 
next segment to be copied (to the right boundary of S') is clearly optimal: Sup- 
pose that the optimal strategy constructs a longer prefix of S than the greedy 
strategy after i steps but not after i — 1 > 0 steps. Then in the step the 
greedy strategy must have copied a segment which is a proper substring of the 
segment B copied by the optimal strategy. This contradicts the greediness of 
step i as all suffixes of B are substrings of R as well; this completes the proof of 
the claim. Now we can complete the proof of the theorem by showing: 

Claim. Cext{R S) < gext{R S) <2 ■ eext{R S). 

The claim follows from the fact that segment copy and character insert op- 
erations to arbitrary locations in S' can be reordered from left to right in a 
way that each segment copy or character insert operation which splits an earlier 
copied segment B into two portions Bi and B 2 , can be emulated by at most two 
operations in the transformation that yields gext{R S): the original segment 
copy or the character insertion itself and another segment copy for i? 2 . 

Note that the approximation factor of 2 is tight: Let R = x,y and S = x', y'. 
It is easy to verify that Cext{R S) = gext{R S) = 2i and e^xtiR 
S) = i (consecutively insert R in the middle of S'). This completes the proof of 
Theorem 1. 



2.2 Internal Transformations 

Now we focus on generalizing bext{R S) and Cext{R S) to bi„t{R — >■ S) and 
Cint{R — >■ S) respectively and proving that Cint{R S) approximates bint(R 
S) up to a constant factor. 

Theorem 2. Distance Cmt{R — >■ S) provides a 12 factor approximation to 
bint{R S), i.e., bint{R S) < Cint{R S) < 12 • bint{R — >■ S). There- 
fore, there is a linear time algorithm for approximating bint{R — >■ S) within a 
constant factor. 

As an illustration, consider the example:i? = v, w, x, y, z,Q = z, y, x, w, v and 
S = w,v, X, w, y, X, z, y, y, x, w, x, w, v, z, y, x, z, y, x, w, y, x, w, v, z, y, x, w, v. 
The reader can verify that Cint{R — >■ Q) = 5 and Cint{Q S) = 10. Then, 
bint{R — >■ 5) < 5 -I- 10 -I- 1 = 16 because one can transform R into Q, Q into QS 
and finally, QS into S with a single deletion. In contrast, Cint{R S) = 20. 

One may think that as the length of S and R grow, “workspace” Q may 
help the ratio {cint{R S'))/(minQ(ci„t(i? Q) + Cmt{Q S) + 1)) grow in 
an unbounded manner. We prove below that there is an upper bound on this 
ratio, and that ui used in the proof of the theorem above. The proof is achieved 
through the following steps. 
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Claim. Let gint{R S) be the internal transformation distance that allows only 
segment copy and character insert operations and only at the right boundary of 
S'. Then -)>£')= S). 



The proof of this claim simply follows from the proof of Claim 2.1. 

Claim. Let — >■ S) be the internal transformation distance that permits 

only segment copy and character insertion operations to the right boundary of 
S' as well as one final prefix deletion operation. Then, €int{R S) < Cint{R — >■ 
S) < 3 ■ eint{R S). 

The transformation giving rise to eint{R S) first constructs an intermedi- 
ate string S' = P then constructs S' = PS. We will partition the construction 
of P into substeps i = 1,2, — Each substep i adds a new substring Pi to the 
right end of S' in such a way that Pi is constructed externally by copying sub- 
strings of R, Pi .. . Pi-i to its right end. For any given k > 0, one can now define 
Gint{R — >■ S')fe in terms of a transformation which is similar to the one that defines 
Gint{R S) with the extra constraint that the maximum number of partitions 
Pi is upper bounded by /c. It is easy to see that eint{R — >■ 5')o = Cext{R — >■ S). 

Let P = argminQ{Cext{R Q) + Cint(R, Q S))). Then clearly eint{R 
S')! < Cext{R Q) + Cint{R, Q S) . Any Q can be written as a concatenation 
of substrings each of which is the result of a single segment copy 

from R. Let S be the concatenation of substrings Si, S 2 , . . ., each of which is 
the result of a segment copy from Q, of the form . . . Qij_iQ'i. for some j, 

where (resp. Q' .) is a suffix of Qi^ (resp. prefix of Qi^). The transformation 
defining Cext{R — ^ S) can emulate the transformation defining eint{R — >■ S)i 
by processing each Si via subdividing it into smaller segments Sjj , Si 2 , . . . and 
copying them either from R or from the already constructed prefix of S. 

To bound the segment copies in the emulation, we use a charging scheme 
that creates three tokens for each copy operation related to eint{R — >■ S)i and 
then uses them to pay for the copy operations related to Cint{R — >■ S). We create 
I, r, and c tokens, for “left”, “right”, and “center”, as follows. Each Qi in Q 
gets a c token for being copied from R. Each segment Sj , for being copied from 
Q into S, gets a c, an r, and an I token. As we will show below this creates 
Cext{R — >■ Q) -I- 3 • Cint{R, <5 — >■ S') tokens, which will imply that Ci„t{R — >■ S) < 
Cext{R — >■ Q) -|- 3 • Cint{R, <5 — >■ S). 

Recall that each Si the transformation defining eint{R — >■ S)i is of the form 
Q'iiQi 2 ■ ■ - for some j. In the transformation defining Cint{R S), 

multiple copies will be made to emulate the copying of Si. Here is how the 
charging works. Every time a segment S = S'(j) is either copied from R, covering 
a Qi entirely, or copied from S', covering an Sk entirely, we use up the c token for 
that Qi or Sk (if multiple are covered, we use an arbitrary one). While simulating 
the construction of an Si, if one needs to copy a prefix (resp. suffix) of some Qj, 
the r (resp. 1) token of Si is used. 

The intuition behind this charging is that the r and I tokens will be used 
for (few) characters at the two ends of a segment corresponding to an Si in the 
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compression transformation and a c token will be used for copying whole Qi or 
Si segments. We observe that there will always be a segment for what needs 
to be copied in the compression transformation that has an unused c token. 
This is due to the fact that a c token is only used up in simulating an Si that 
entirely covers the segment that “owns” that token. However, the construction 
of Si in eint{R S)\ creates another c token itself that provably no other copy 
operations will request. 

One can induce on k to show by a similar argument that for any string 
P = Pi ■ ■ ■ Pk, Cint{R S) < Cint{R — >■ Pi, . . . , Pfc-l) + 3 • Cext{R, Pi ■ ■ ■ Pfc-1 
Pk) + 3 • Cint{R, P 17 ■ ■ ■ Pk ^ S). 

Because emt{R S)k = minvp=Pi,,,,, Pk [(X)i=l ^ext{R, Pi ■ ■ ■ Pi-1 — >■ Pi)) + 
Cint{R,P — >■ S')], for any k, this implies that Cint{R — >■ S) < 3 • eint{R S) 
which completes the proof of Claim 2.2. 

We finish off the proof of the theorem by the following claim. 

Claim. eint{R S) < A - bint{R S). 

Consider the transformation defining bint{R S). Let Rq = R and let the 
result of the segment edit operation give the string Ri. Thus if bint{R — >■ 
S) = k, then Rk = S. We emulate each operation performed on R during the 
transformation via at most 4 operations permitted in the transformation defining 
^int{R ^ S) . 

We again start with R'^ = R and iteratively construct P' = R'i_iRi = 
Ro, Ri ■ . ■ ,Ri- 

(1) If the operation performed by the transformation is deletion of segment 
Ri[k : 1], then the transformation defining eint{R S) copies the segments 
P'[l : k—1] and P'[^ + 1 : jP'j] from P' to its end so as to obtain Rt = R'i, Ri+i. 

(2) Similarly, if the operation copies segment P'[l : h] in between P'[A: — 1] 
and R'i[k], then the transformation defining eint{R — >■ S) copies the segments 
P'[l : k], R'i[l : h] and P'[fc + 1 : jP'j]. 

(3) Finally, if the z*^ operation relocates segment P'[^ : h] in between P'[/c— 1] and 
R'i[k] (WLOG, assume that k <l) then the transformation defining ei„t(P — >■ S) 
copies the segments P'[l : fc], P'[? : /i], P'[fc + 1 : ^ — 1] and P'[/z + 1 : jP'j]. 

The above claim implies that Cint{R — >■ S) approximates bint{R S) (and 
thus provides a lower bound to any distance in the internal model) within a 
factor of 12 completing the proof of Theorem 2. □ 

Using the proof of the previous theorem, it will follow that 

Corollary 1. Given any two distances /(• — >■ •) and g{- — >■ •) in the internal 
model where character insertions and segment copies are allowed, f{R S) < 
(j) ■ g{R — >■ S) for some constant (j) for all strings P, S. 

This improves the C(logrz) approximation in [10,29] for this special case of seg- 
ment rearrangement distance problem. 
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3 Concluding Remarks 

We believe that the constants in our approximation factor can be improved by 
a more detailed analysis of the relationship between segment rearrangements. A 
recent work reports some progress on improving the constant factor in the ap- 
proximation [34]. It is wide open how to prove lower bounds for our problem. For 
example, recently [17], a small constant lower bound was proved for embedding 
the Levenshtein edit distance into L\ and ^ 2 - We can prove a matching lower 
bound for string similarity based on segment rearrangements as well. However, 
that does not shed light on the general hardness of comparing sequences pair- 
wise. Another open problem is to determine if there exists sequence embedding 
that gives o(log n) approximation of segment edit distances between any pair of 
sequences. 
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Abstract. We provide an algebraic characterization of the expressive 
power of various naturally defined logics on finite trees. These logics are 
described in terms of Lindstrom quantihers, and particular cases include 
hrst-order logic and modular logic. The algebraic characterization we give 
is expressed in terms of a new algebraic structure, finitary preclones, and 
uses a generalization of the block product operation. 



1 Introduction 

The notion of recognizability emerged in the 1960s (Eilenberg, Mezei, Wright, 
and others, cf. [12,22]) and has been the subject of considerable attention since, 
notably because of its close connections with automata-theoretic formalisms and 
with logical definability, cf. [4,9,13,30] for some early papers. 

Recognizability was first considered for sets (languages) of finite words, cf. 
[11] and the references contained in op. cit. The general idea is to use the alge- 
braic structure of the domain, say, the monoid structure on the set of all finite 
words, to describe some of its subsets. More precisely, a subset of an algebra is 
said to be recognizable if it is a union of classes in a (locally) finite congruence. 
The same concept was adapted to the case of finite trees, traces, finite graphs, 
etc, cf. [12,22,8,6]. 

It follows rather directly from this definition of (algebraic) recognizability 
that a finite - or finitary - algebraic structure can be canonically associated 
with each recognizable subset L, called its syntactic structure. Moreover, the 
algebraic properties of the syntactic structure of L reflect its combinatorial and 
logical properties. The archetypal example is that of star-free languages of finite 
words: they are exactly the languages whose syntactic monoid is aperiodic, cf. 
[26]. They are also exactly the languages that can be defined by a first-order 
(FO) sentence, cf. [21], and the languages that can be defined by a temporal 
logic formula, cf. [18,16,5]. In particular, if we want to decide whether a given 
regular language L is EO-definable, we do not know any algorithm that does 
not, in one form or another, verify that the syntactic monoid of L is aperiodic. 

An important open problem is the analogous question concerning languages 
of finite trees [24] : can we decide whether a regular tree language is F O-definable? 
Based on the existing literature, it is tempting to guess that an answer to this 
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problem could be found using algebraic methods. A central motivation for this 
paper is to present an algebraic framework which allows a nice characterization 
of FO-definable tree languages. Let us say immediately that we do not know yet 
whether this characterization can be turned into a decision algorithm! 

Let A be a ranked alphabet. The set of A-labeled trees can be seen in a 
natural way as a (free) A-algebra, where S is now seen as a signature. It has been 
known since [12,22,9] that the regular tree languages are exactly the recognizable 
subsets of this A-algebra. We refer the reader to [17,23,24] for attempts to use 
this algebraic framework and some of its variants to characterize FO-definable 
tree languages. In this paper, we propose a different algebraic view - which 
preserves however the recognizable sets of trees. 

More precisely, we consider algebras called preclones (they lack some of the 
operations and axioms of clones [7]). Precise definitions are given in Section 2.1. 
Let us simply say here that, in contrast with the more classical monoids or 
27-algebras, preclones have infinitely many sorts, one for each integer n > 0. 
As a result, there is no nontrivial finite preclone. The corresponding notion is 
that of finitary preclones, that have a finite number of elements of each sort. 
An important class of preclones is given by the transformations T{Q) of a set 
Q. The elements of sort (or rank) n are the mappings from Q" into Q and the 
(preclone) composition operation is the usual composition of mappings. Note 
that T{Q) is finitary if Q is finite. 

It turns out that the finite 27-labeled trees can be identified with the 0-sort of 
the free preclone generated by 27. The naturally defined syntactic preclone of a 
tree language L is finitary if and only if L is regular. In fact, if S is the syntactic 
27-algebra of L, the syntactic preclone is the sub-preclone of T(S') generated by 
the elements of 27 (if ct G 27 is an operation of rank r, it defines a mapping from 
S''’ into S, and hence an element of sort r in T(S)). Note that this provides an 
effectively constructible description of the syntactic preclone of L. 

One can develop the expected theory of varieties of recognizable tree lan- 
guages and pseudovarieties of preclones, leading to an Eilenberg-type variety 
theorem, related to that presented in [14]. This requires combinatorially much 
more complex proofs than in the classical settings, and we give a brief overview 
of this set of results, as needed for the sequel of the paper. 

However our main concern in this paper is to give algebraic characterizations 
of certain natural logically defined classes of tree languages. A representative 
example of such classes is that of the FO-definable tree languages, but our results 
also apply to the {FO + MOI?)-definable tree languages of [23,29] and many 
other classes. The common point of these classes of formulas is that they use 
new quantifiers, each of which is based on a regular tree language. For instance, 
the usual existential quantifier is associated with the language of those trees 
containing at least one vertex labeled by a symbol corresponding to the truth 
value 1. (See Example 10 for a more precise description). 

The algebraic characterization which we obtain uses a notion of block product 
(or 2-sided wreath product) of preclones, inspired by Rhodes and Tilson’s block 
product [25] and Eilenberg’s bimachines [11]. 
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Technically, if /C is a family of regular tree languages and V is the pseudova- 
riety of preclones generated by the syntactic preclones of the elements oi 1C, if 
Lind(/C) is the formal logic whose formulas use the language of If-labeled trees, 
the Boolean connectives and the Lindstrom quantifiers [19,10] associated with 
the languages of /C, then a regular tree language is Lind(/C)-definable if and only 
if its syntactic preclone lies in the least pseudovariety of preclones containing V 
and closed under block product. To be completely accurate, the preclones in this 
statement must formally be accompanied by a designated subset of generators, 
and /C and Lind(/C) must satisfy certain simple closure properties. 

Returning to FO-definable tree languages, this tells us that a tree language 
is FO-definable if and only if its syntactic preclone lies in the least pseudovariety 
closed under block product and containing the sub-preclone of the preclone of 
transformations of the two-element set {0, 1}, generated by the binary or function 
and the (nullary) constants 0, 1. As pointed out earlier, we do not know whether 
this yields a decidability proof for FO-definable tree languages, but it constitutes 
at least an avenue to be explored in the search for such a decision procedure. 

In order to keep this paper within the required format, full proofs are reserved 
for a later publication. 



2 The Algebraic Framework 

Let Q be a set and let T„(Q) denote the set of n-ary transformations of Q, that 
is, mappings from Q” to Q. Let then T{Q) = (T„(Q))n>0) called the preclone 
of transformations of Q. The set 'Ti(Q) of transformations of Q is a monoid 
under the composition of functions. Composition can be considered on T{Q) 
in general: if f € Tn{Q) and gi € Trm{Q) (1 < * < n), then the composite 
f{gi,...,gn), defined in the natural way, is an element of Tm{Q) where m = 
This composition operation and its associativity properties are exactly 
what is captured in the notion of a preclone. 

Remark 1. Preclones are an abstraction of sets of n-ary transformations of a 
set, which generalizes the abstraction from transformation monoids to monoids. 
Clones, [7], or equivalently, Lawvere theories [3,14] are another such abstraction, 
more classical. We will not take the space to discuss the differences between 
clones and preclones, simply pointing here the fact that each of the m arguments 
of the composite f{gi, . . . , gn) above is used in exactly one of the gi, in contrast 
with the definition of the clone of transformations of Q. Readers interested in this 
comparison will have no difficulty to trace those differences in the sequel. The 
category of preclones is equivalent to the category of strict monoidal categories 
[20] or magmoids [2] “generated by a single object”. 



2.1 Preclones 

A preclone is a many-sorted algebra S = ((Sn)n>o, 1)> where n ranges over the 
nonnegative integers, equipped with a composition operation ■ such that for each 
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f € cind gi G Sm^ ^ : On G SfUn ? * (/? Ill 5 ■ • ■ 7 9n) € Sm where m = X;*6[„] 

We usually write / ■ (51 © • • • © gn) for -{f,gi, ■ ■ ■ ,gn)- The constant 1 is in 
We require the following three equational axioms: 

(/ ■ (51 © • • • © g-n)) • (hi © • • • © hm) = / • ((51 • ^ 1 ) © • • • © {gn ' hn)), (1) 

where f,gi, .. .,gn are as above, hj G Sk^, j G [m], and hi = hmi+...+m,i.i+i © 
* * * © ^ G [n], and 



1 • / = / ( 2 ) 

/•(1©...©1) = /, (3) 

where f G Sn and 1 appears n times on the left hand side of the last equation. 
An element of S'„ is said to have rank n. 

The notions of morphism between preclones, sub-preclone, congruence and 
quotient are defined as usual. Note that a morphism maps elements of rank n to 
elements of the same rank, and that a congruence only relates elements of the 
same rank. It is not difficult to establish the following. 

Fact 2. Every preclone can he embedded in a preclone of transformations. 

We say that a preclone S is finitary if each is finite. For instance, if Q 
is a finite set, then T{Q) is finitary. Note that a finitary preclone S does not 
necessarily embed in the transformation preclone of a finite set. 

For technical reasons it is sometimes preferable to work with generated pre- 
clones (gp’s), consisting of a pair (S', A) where S is a preclone, A is a nonempty 
subset of S, and S is generated by A. The notions of morphisms and congruences 
must be revised accordingly: in particular, a morphism of gp’s from (S, A) to 
(T, B) must map A into B. A gp (S, A) is said to be finitary if S is finitary and 
A is finite. 

Example 3. Let A be a ranked alphabet, so that A is a finite set of ranked 
symbols, and let Q be a A-algebra. Recall that Q can also be described as (the 
set of states of) a tree automaton accepting A-labeled trees. The elements of 
A of rank n can be viewed naturally as elements of T„{Q). The preclone they 
generate within T{Q), together with A, is called the gp associated with Q. 



2.2 Trees and Free Preclones 

Let A be a ranked alphabet and let {vk)k>i be a sequence of variable names. 
We let AM„ be the set of finite trees whose inner nodes are labeled by elements 
of A (according to their rank), whose leaves are labeled by elements of Ag U 
{ui, . . . , Vn}, and whose frontier (the left to right sequence of variables appearing 
in the tree) is the word vi ■ ■ ■ that is, each variable occurs exactly once, and 
in the natural order. Note that AMg is the set of finite A-labeled trees. We let 
AM = (AM„)„. 
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The composite tree / • (gi © • • • © gn) (/ G SMn) is obtained by substituting 
the root of the tree gi for the variable Vi in /, and renumbering consecutively 
the variables in the frontiers of (/i, . . . ,g„. Let also 1 G EMi be the tree with a 
single vertex (labeled ui). Then {SM, •, 1) is a preclone. 

Each letter a G S of rank n can be identified with the tree with root labeled 
cr, where the root’s immediate successors are leaves labeled Then 

every rank-preserving map from T" to a preclone S can be extended in a unique 
fashion to a preclone morphism from SM into T. That is: 

Fact 4. SM is the free preclone generated by S, and {SM, S) is the free gp 
generated by S. 

Note that is nonempty for all n > 0 exactly when Sq and at least one 

Sn with n > 1 are nonempty. Below we will only consider such ranked sets. 
Moreover, we will only consider preclones S such that is nonempty for all 
n > 0. 



3 Recognizable Tree Languages 

The algebraic framework described in Section 2 leads naturally to a definition 
of recognizable languages: a subset L of EMk is recognizable if there exists a 
morphism a from SM to a finitary preclone S such that L = a~^a{L). As usual, 
the notion of recognizability can be expressed equivalently by stating that L is 
saturated by some locally finite congruence on SM (a congruence is locally finite 
if it has finite index on each sort). 

If L C SMf^ is any tree language, recognizable or not, then there is a coarsest 
congruence saturating it, called the syntactic congruence of L. This congru- 
ence can be described as follows. First, an m-ary context in SM^ is a tuple 
(u, fci, ^ 2 , v) where 

• is an TO-tuple (vi, . . . , Vm), written v = vi©- • -(BVm, with Vj G T^. , 1 < z < m, 

• zt G Tfcj+i+fc^ and 

• k = ki + i + k 2 with £ = X)™ i ^i- 

(zt, ki,k 2 , v) is an L-context of an element / G SMm if u- (ki ©/•z;©k 2 ) G L. 
Here n denotes the ©-sum of n terms equal to 1. Then, for each f,g G SM^^, 
we let / 5 iff / and g have the same L-contexts. We denote by {Ml,Sl) the 

quotient gp {SM / S / ~l), called the syntactic gp of L. Ml is the syntactic 
preclone of L and the projection morphism til'-SM -g Ml is the syntactic 
morphism of L. 

Fact 5. The congruence ^l of a language L C SMk is the coarsest preclone 
congruence that saturates L. In other words, if a: SM S is a preclone mor- 
phism, L = a~^a{L) if and only if a can be factored through r]L- In particular, 
L is recognizable if and only if ^l is locally finite, if and only if Ml is finitary. 

One can also show the following proposition, relating preclone recognizability 
with the usual notion of regular tree languagess. 
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Proposition 6. The syntactic gp of a tree language L C SMq is the gp asso- 
ciated with the syntactic S-algehra of L. In particular, L is recognizable if and 
only if L is a regular tree language. 

While not difficult, this result is important because it shows that we are not 
introducing a new class of recognizable tree languages: we are simply associating 
with each regular tree language a finitary algebraic structure which is richer 
than its syntactic H-algebra (a.k.a. minimal deterministic tree automaton). The 
proposition implies that the syntactic gp of a recognizable tree language has an 
(effectively computable) finite presentation. 

One can define pseudovarieties of preclones as those nonempty classes of 
finitary preclones closed under direct product, sub-preclones, quotients, finitary 
unions of w-chains and finitary inverse limits of w-sequences. (The latter two 
constructs are needed because preclones have an infinite number of sorts.) Here, 
we say that a union T = U„T„ of an w-chain of preclones T„, n > 0 is fini- 
tary exactly when T is finitary. Finitary inverse limits lim„ T„ of w-diagrams 
h„ : Tn+i Tn, n > 0 are defined in the same way. Also, one can define pseu- 
dovarieties of gp’s as those classes of finitary gp’s closed under direct product, 
sub-preclones, quotients and finitary inverse limits of w-sequences. (Closure un- 
der finitary unions of w-chains comes for free, since all finitary gp’s are finitely 
generated.) 

Suppose now that V is a nonempty class of recognizable tree languages L C 
SMk, where E is any finite ranked set and fc > 0. We call V a variety of tree 
languages, or a tree language variety, if it is closed under the Boolean operations, 
inverse morphisms between free preclones generated by finite ranked sets, and 
quotients defined as follows. Let L C EMk be a tree language and let {u, k\,k 2 , v) 
be an m-ary context in EMk- Then the left quotient (u, ki, k 2 )~^L and the right 
quotient Lv~^ are defined by 

{u, fci, fe)”^ = {^ G EMn \ ki -\- n -\- k 2 = k, u- (ki © t © k 2 ) G L} 

Lv~^ = {t € EMm I t • u G L}. 

A literal variety of tree languages is defined similarly, but instead of closure 
under inverse morphisms between finitely generated free preclones we require 
closure under inverse morphisms between finitely generated free gp’s. Thus, if 
L C EMk is in a literal variety V and h : AM — >■ EM is a preclone morphism 
with h{A) C E, where A is finite, then h~^{L) is also in V. 

Below we present an Eilenberg correspondence between pseudovarieties of 
preclones (gp’s respectively), and varieties (literal varieties) of tree languages. 
For each pseudovariety V of preclones (resp. gp’s), let V denote the class of those 
tree languages L C EMk, where E is any ranked alphabet and k > 0, whose 
syntactic preclone (syntactic gp, resp.) belongs to V. 

Theorem 7. The correspondence V i— V defines an order isomorphism between 
pseudovarities of preclones (gp’s, resp.) and tree language varieties (literal vari- 
eties of tree languages, resp.). 
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Remark 8. Two further variety theorems for finite trees exist in the literature. 
One uses minimal tree automata as syntactic algebra [1,27], and the other uses 
syntactic Lawvere theories, i.e. clones [14]. No variety theorem is known for the 
3-sorted algebras proposed in [31]. 

4 Logically Defined Tree Languages 

Let LI be a ranked alphabet. We will define subsets of by means of logical 
formulas. Our atomic formulas are of the form 

Pcr{x), X < x' , Succj(x, x'), leftj(a;) and right^(a;) 
where a G S, i, j are positive integers, i is less than or equal to the maximal 
rank of a letter in S, and x, x' are first-order variables. If k is an integer, subsets 
of SMk will be defined by formulas of rank k, composed using atomic formulas 
(with j G [fc]), the Boolean constants false and true, the Boolean connectives and 
a family of generalized quantifiers called Lindstrom quantifiers, defined below. 

When a formula is interpreted on a tree t G SM^, first-order variables are 
interpreted as vertices of t, Pcr{x) holds if x is labeled a {a G S), x < x' holds if 
x' is a proper descendant of x, and Succi(a;, a;') holds if x' is the i-th successor 
of X. Finally, left j (a;) holds (resp. right^(a;) holds) if the index of the highest 
numbered variable labeling a leaf to the left (resp. right) of the frontier of the 
subtree rooted at x is j. 

We now proceed with the definition of (simple) Lindstom quantifiers, adapted 
from [19,10] to the case of finite trees. Let Z\ be a ranked alphabet containing 
letters of rank m for each m such that Sm ^ 0, and let K C AMk- Let x be 
a first-order variable. We describe the interpretation of the quantified formula 
QKX.{(ps)seA, where the quantifier Qk binds the variable x - here {(ps)seA is a 
family of (previously defined) formulas on Lf-trees which is deterministic w.r.t. 
X. We may assume that x is not bound in the (fs- Deterministic means that for 
each t G PM}., for each m such that Am ^ 0, for each interpretation A of the 
free variables in the <^5 mapping cc to a vertex of t labeled in then {t. A) 
satisfies exactly one of the ips, S G Am- 

Given this family (tps), a tree t G and an interpretation A of the free 

variables in the (ps except for x, we construct a tree t\ G AMj^ as follows: the 
underlying tree structure of t\ is the same as that of t, and the vertices labeled 
Vj {j G [fc]) are the same in both trees. For each vertex u of t labeled by a G Pm, 
let X' be the interpretation obtained from A by mapping variable x to vertex v: 
then the same vertex v in t\ is labeled by the element 5 G Am such that (t. A') 
satisfies (ps- 

Finally, we say that (t. A) satisfies Qkx ■ {(ps)s&A if t\ G K. 

Example 9. Let Z\ be a ranked alphabet such that each Z\„ is either empty or 
equal to {0„, 1„} (such an alphabet is called Boolean), and let fc > 0. 

(1) Let K = K{J) denote the (recognizable) language of all trees in AM}. 
containing at least one vertex labeled 1„ (for some n). Then the Lindstrom 
quantifier corresponding to iF is a sort of existential quantifier. More precisely. 
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let {ips) 5 ^A be a collection of formulas: let us write (/?„ for and note that 
i^o„ is equivalent to Now let t G SMj. and let A be an interpretation of 

the free variables in the (ps except for x. Then (t, A) satisfies Qkx ■ {ps)s^A if 
and only if, for some n, (/?„ is satisfied by {t, A') for some extension A' of A which 
maps variable cc to a vertex of rank n. 

For instance, if S consists only of constants and one symbol of rank 2 and if 
(fio = false, then {t, A) satisfies Qkx- {(ps)s^A if and only if is satisfied by some 
(t, A') where X' extends A by mapping variable x to a vertex of rank 2 of t. In 
particular, if x is the only free variable in the ips, then t satisfies Qkx ■ {ps)seA 
if and only if there exists x, a vertex of rank 2 of f which satisfies (fi 2 {x). 

(2) In the same manner, if p > I, r < p and K = K{3p) denotes the (recogniz- 
able) language of those trees in AMk such that the number of vertices labeled 
In (for some n) is congruent to r modulo p, then the Lindstrom quantifier Qk 
is a modular quantifier. 

(3) Let K = iF(3path) be the set of all trees in AMk such that all the inner 
vertices along at least one maximal path from the root to a leaf are labeled 1„ 
(for the appropriate n). Then Qk is an existential path quantifier. 

(4) Let Knext denote the collection of all trees in AMk such that each maximal 
path has length at least three and the vertices on the second level are labeled 
1„ (for the appropriate n). Then iLnext is a sort of next modality. Other next 
modalities can be expressed likewise. 

For a class /C of tree languages, we let Lind(/C) denote the logic defined above, 
equipped with Lindstrom quantifiers associated to the languages in /C, and we 
let Cind{IC) denote the class of Lind{IC)- definable tree languages: a language 
L C SMk belongs to Cind{IC) iff there is a sentence of rank k over S all of 
whose Lindstrom quantifiers are associated to languages in /C such that L is the 
set of those trees t € SMk that satisfy p. 

Example 10. Let /C3 be the class of all the languages of the form K{3) on a 
Boolean ranked alphabet (see Example 9 (1)). One can verify that Cind{lCfi) is 
exactly the class of EO-definable tree languages. And when /Ca,mod is the class 
of all languages of the form K{3) or AT(3p, then £mc?(/C3_mod) is the class of 
{FO + MOD)-definable tree languages. 

Theorem 11. Let 1C he a class of tree languages. 

• /C C Cind{K.), /Ci C /C2 =k Cind{K.i) C Cind{K. 2 ) and Cind{Cind{K.)) = 
Lind{lC) (that is, Cind is a closure operator). 

• Lind{lC) is closed under the Boolean operations and inverse morphisms be- 
tween finitely generated free gp’s. It is closed under quotients iff any quotient of 
a language in tC belongs to Lind{lC). 

It will follow from our main result that if JC consists of recognizable tree 
languages, then so does £ind{tC). This can also be proved directly by expressing 
the Lindstrom quantifiers associated to the languages in /C in monadic second- 
order logic. 
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Corollary 12. Let 1C be a class of recognizable tree languages. Then Lind{lC) 
is a literal variety iff any quotient of a language in K. belongs to Cind{IC) (e.g. 
if K. is closed under quotients). 

Lind{lC) is a variety iff any quotient and any inverse image of a language in 
tC under a morphism between finitely generated free preclones belongs to Lind{lC). 



5 Algebraic Characterization of Cind{K) 



5.1 Block Product 



The block product of monoids was introduced in [25] as a two sided generalization 
of the wreath product [11]. It is closely related to Eilenberg’s bimachines and 
triple products [11]. Block products of monoids have been used extensively in 
[28] to obtain algebraic characterizations of the expressive power of certain logics 
on finite words. In this section we extend this operation to preclones and gp’s. 

Let S,T be preclones and k > 0. For each to > 0, let Ik^m be the set 
of all TO-ary contexts in (see Section 3). Then for each to > 0, we let 
{S Dfc T)m = 5m-’” X T^. This defines the carriers of the block product 5 T. 
The operation of composition is defined as follows. For simplicity, we denote an 
element (m, fci, fe, v) of Ik,m by just (u, v), where it is understood that u comes 
with a splitting of its argument sequence as fci + 1 + ^ 2 . Let (F, g) € (5 T)„, 

let {Fi, gi) € (5 Dfc for each i € [n] and let m = Then we let 

(T’, 5) ■ ((^i, (?i) © • • • © (Fn,gn)) 

be the element {F' ,g-{gi(B- ■ •©(?«)) of (SDkT)jn such that, for each (u,v) G Ik,m 
(using the notation of Section 3), 



F'{u, v) = F(u, gi ■ vi W ■ ■ ■ W g„ ■ v„) ■ 



Fi(ui,vi) e e F„(Un,V„) , 



where vi denotes the ©-sum of the first toi Vi’s, V 2 denotes the ©-sum of the 
next to -2 vfs, etc, and where for each 1 < f < n. 



■Uj = u • (ki © g • (gi • © • • • © • Vi-i © 1 © gi+i ■ Vi+i © • • • © • w„) © k2). 

If we let J\ denote the sum of the first toi J 2 denotes the sum of the next 
TO2 ifs, etc, one can verify that {ui,Vi) G Ik, mu where the argument sequence 
of Ui is split as k = {ki + ^j) + 1 + (X)j>i + ^ 2 )- It is long but routine 

to verify the following. 

Proposition 13. 5 T is a preclone. 

When (5, A) and (T, B) are gp’s and /c > 0, we define the block product 
(5, A) Dfc (T, B) as the gp (i?, C), where C is the collection of all pairs (F, b) in 
SUkT such that b £ B and F{u, v) G A, for all appropriate u, v, and where R 
is the sub-preclone generated by C in 5 Ok T. 
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5.2 Main Theorem 

We need one more technical definition before stating the main result. Let JC be 
a class of tree languages. We say that Lind(/C) admits relativization if, for each 
sentence ip of Lind(/C), each integer i > 1 and each first-order variable x, there 
exist formulas if\>i x] and x], with x as sole free variable, such that: 

• x] holds on t if the subtree of t whose root is the z-th child of vertex x 
satisfies ip] 

• ip\:^ x] holds on t if the tree r obtained from t by deleting the subtree rooted 
at x and relabeling that vertex with a variable, satisfies ip. 

Formally, we require the following: 

• given ip, a sentence of rank £, and integers k\,k 2 , there exists a formula ip[>i x] 
of rank k = k\ + £ + k 2 such that, if t G SMk, the label of vertex ru of t is in 
Sm with m > i, t = r ■ (ki © s © k2) for some s G SM^, and the root of s is the 
z-th successor of vertex w in t, then (t, a; i— ru) ^ x] if and only if s |= 

• given integers k\,k 2 , a sentence ip of rank k\ + l + k 2 and an integer k > k\ + k 2 
there exists a formula ip[^ x] of rank k such that, if t G SMk, t = r • (ki ©s©k 2) 
and vertex w of t is the root of the subtree s, then (t,x i— w) |= ip[:;£- x] if and 
only if r 1= 1^. 

We now state our main theorem, which extends the main result of [15] (re- 
lating to finite words) to finite trees. 

Theorem 14. Let 1C he a class of recognizable tree languages such that any 
quotient of a language in K. belongs to Lind{lC) and such that Lind(/C) admits 
relativization. Then a language is in Lind{lC) iff its syntactic gp belongs to the 
least pseudovariety of finitary gp’s containing the syntactic gp’s of the languages 
in /C and closed under block products. 

5.3 Applications 

The class of all recognizable tree languages is closed under taking quotients and 
one verifies easily that the corresponding logic admits relativizations. It follows 
that: 

Corollary 15. If K. consists of recognizable languages, then so does Lind{lC). 

By Example 10, the class of FO-definable tree languages is Cind{lC^). One 
can verify that Lind(/C 3 ) admits relativization, and any quotient of a language 
in /C3 belongs to Lind{lC^). In order to use Theorem 14, we need to compute 
the syntactic gp’s of the tree languages in /C 3 . Let Z\ be a Boolean alphabet, 
k > 0 and K C AMk be as in Example 9 (1). It is not difficult to verify that 
the syntactic Z\-algebra of K has two elements, say B = {true, false}, and if 
An yf 0, then 1„ is the constant function true and 0„ is the n-ary or function. By 
Proposition 6, the syntactic gp of K is the pair (T, A) where T is the sub-preclone 
of T{B) generated by A. 

Now let T3 be the sub-preclone of T{B) generated by the binary or function 
and the nullary constants true and false: then {T^)n consists of the n-ary or 




On Logically Defined Recognizable Tree Languages 205 



function and the n-ary constant true. One can verify that no proper sub-preclone 
of contains the nullary constants true and false, and some n-ary or function 
(n > 2). Since we are assuming that Z\„ yf 0 for some n > 2, it follows that the 
syntactic gp of iL is a pair (T3, A). 

Next let K3 be the class of gp’s whose underlying preclone is isomorphic to 
T3. By Theorem 14, a tree language L is FO-definable if and only if its syntactic 
gp lies in the least pseudovariety of gp’s containing K3 and closed under block 
product. Next one verifies that a gp belongs to this pseudovariety if and only 
its underlying preclone lies in the least pseudovariety of preclones containing T3 
and closed under block product. Finally, we get the following result. 

Corollary 16. A tree language is FO-definahle iff its syntactic preclone belongs 
to the least pseudovariety containing and closed under block product. 

Let p>2 and let Bp = {0, 1, . . . ,p— 1}. Let Tp be the sub-preclone of T{Bp) 
whose rank n elements {n > 0) consists of the mappings /n,r: (ri, . . . , r„) !—>■ 
Cl r„ -|- r mod p for 0 < r < p. By a reasoning similar to that used for 

Corollary 16, we can show the following. 

Corollary 17. A tree language is FO+ MOD -definable iff its syntactic preclone 
belongs to the least pseudovariety containing and the Tp and closed under block 
product. 

6 Conclusion 

We reduced the characterization of the expressive power of certain naturally 
defined logics on tree languages, a chief example of which is given by first-order 
sentences, to an algebraic problem. 

For this purpose, we introduced a new algebraic framework to discuss tree 
languages. However, the resulting notion of recognizability coincides with the 
usual one: we simply gave ourselves a richer algebraic set-up to classify recog- 
nizable tree languages. This does not yield directly a decidability result for, say, 
first-order definable tree languages, but we can now look for a solution of this 
problem based on the methods of algebra. In this process, it will probably be 
necessary to develop the structure theory of preclones, to get more precise results 
on the block product operation. 

A positive aspect of our approach is its generality: it is not restricted to the 
characterization of logics based on the use of Lindstrom quantifiers (nor indeed to 
the characterization of logics) . For instance, the use of wreath products instead 
of block products, will yield algebraic characterizations for other natural classes 
of recognizable tree languages. 
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Abstract. We present a spectrum of randomized time-space tradeoffs 
for solving directed graph connectivity or STCONN in small space. We 
use a strategy parameterized by a parameter k that uses k pebbles and 
performs short random walks of length n ^ using a probabilistic counter. 
We use this to get a family of algorithms that ranges between log^ n and 
logn in space and 2*°® " and n" in running time. Our approach allows 
us to look at Savitch’s algorithm and the random walk algorithm as two 
extremes of the same basic divide and conquer strategy. 



1 Introduction 

The Graph Reachability problem is central to the study of algorithms that run 
in small space. Gharacterizing the complexities of the directed and undirected 
versions of this problem is an important problem that has received much atten- 
tion. 

Undirected s-t connectivity or USTGONN is in NL. It can also be decided in 
RL by taking a random walk on the graph [1]. Nisan [9] derandomizes the random 
walk using a pseudorandom generator for small space and proves that it lies in 
SG. The resulting algorithm runs in polynomial time and space 0(log^ n). More 
space efficient deterministic algorithms are known, though they do not run in 
polynomial time. The algorithm due to Nisan et al. [10] runs in DSPAGE(log5 n). 

4 

Subsequently Armoni et al. [2] gave an algorithm that runs in DSPAGE(log5 n). 
Feige [7] shows a family of randomized polynomial time algorithms that range 
between breadth first search and the RL random walk algorithm. 

Directed graph connectivity or STGONN is complete for NL under logspace 
reductions. It seems to have higher complexity than USTGONN for determin- 
istic and randomized complexity classes. It is in DSPAGE(log^ n) by Savitch’s 
theorem [14]. Savitch’s algorithm is not a polynomial time algorithm, it runs in 
time 2^°s ”. There has been little success in designing algorithms that simulta- 
neously run in small space and time. STGONN is not known to lie in SG. The 
best tradeoff known currently [3] does give polynomial time for o(n) space, but 
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not for space 'n}~^ for any constant e. It is also not known whether we can use 
randomness to reduce the space required. A simple random walk with restarts 
[8] works in space log n but the running time is doubly exponential in the space. 

Lower bounds for STCONN have been proved in several restricted models. 
See [13], [17] for a survey of some known results. In particular a lower bound 
of is known in the JAG model of Cook and Rackoff [5] and some of its 

extensions. For the RJAG model Berman and Simon [4] show that this bound 
holds for any algorithm running in time 2*°s ”. Poon [12] generalizes this 

result to show a space lower bound of log log^n+iog log t algorithm that runs 

in expected time T on a randomized NNJAG, a generalization the RJAG model. 

Our Results 

We present a spectrum of randomized time-space tradeoffs for solving directed 
graph connectivity or STCONN in small space. The spectrum ranges from 
Savitch’s algorithm to the random walk with restarts algorithm of [8]. We use 
a strategy parameterized by a parameter k that uses k pebbles and performs 
short random walks of length d = . Our main theorem is 

Theorem 1: 

There is a randomized algorithm that solves directed graph connectivity in 

d log^ n 1 2 

RTISP{2^^,^-^) for 2 < d < n. 

In particular 

— When k = 1 and d = n this reduces to the random walk algorithm of [8] 
which solves STCONN in space logn and expected time n". 

— When k = log n and d = 2, we get a randomized analog of Savitch’s algorithm 
[14] which takes space log^ n and time 2^°® 

— Intermediate values of k give algorithms with parameters in between. If fc = 

lo^giogra ^ ~ algorithm that runs in space and 

time 

Our algorithms may not halt on every single run of coin tosses. Indeed it is 
known that if a machine runs in space s and must halt on every run, then it can 
take time at most 2^ [15]. The time taken by our algorithms however is more 
than exponential in the space. 

Our algorithms can be made to run on the probabilistic NNJAG model de- 
fined by Poon in [12]. For space S < i our algorithms exactly match the 

lower bounds shown by Poon. This shows that these lower bounds are optimal 
for a large range of parameters. Previously the only lower bounds known to be 
optimal in any JAG related model were for space l7(log^ n) due to Edmonds, 
Poon and Achlioptas [6]. Our algorithms use a divide and conquer strategy on 
the random walk. We divide a single random walk into a number of random 
walks of smaller size. This requires more space but speeds up the algorithm. It 
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has been noted previously (e.g. [17]) that if we can solve s-t connectivity in log- 
space for any distance which is a>(l), then we can beat the log^ n space bound. 
That the random walk with restarts works in small space for any distance is also 
a classical result, as is the idea of small space probabilistic clocks [8], [16]. The 
main contribution of this paper is to put these ideas together to yield an in- 
teresting trade-off result. Our approach allows us to look at Savitch’s algorithm 
and the random walk algorithm as variants of the same basic divide and conquer 
strategy. The main lemmas involve a careful analysis of the time, randomness, 
space and error probability of our hierarchical random walks. 

In the next section we outline the idea behind the algorithm and analyze 
clocks that run in small space. The algorithm and its analysis are presented 
in full detail in Section 3. We discuss the implementation of our algorithm on 
probabilistic NNJAGs in Section 4. 



2 Outline of the Algorithm 

Suppose we are given a directed graph G on n nodes and are asked to find a 
path of length d from u to v or report that no such path exists. One way to do 
this is by performing random walks of length d with restarts: Starting from u 
randomly choose the next step for d steps. If we reach v then we are done, else 
we restart at u. If indeed there is path between u and v of length d, we expect 
to find it in expected time n'^ . If we run for much longer (say time and do 
not find a path then we report that no such path exists and we will be correct 
with high probability. In [8] it is shown how this procedure can work in space 
log n. 

Our algorithm uses the above procedure of random walks with restarts. We 
use a divide and conquer strategy which is analogous to the recursive doubling 
strategy of Savitch’s algorithm. Let us denote by G* the graph whose edges are 
paths in G of length 1. If there is an s-t path in G, there is an s-t path of length 
at most " in Gb 

Suppose we have k pebbles. Set d = ni . We number our pebbles 
and use them in a hierarchical fashion. We try to discover an s-t path in G's of 
length d. This we achieve by performing random walks of length d with restarts 
on G'^ using pebble 1. However, we do not have G"? described explicitly, so 
we work recursively. We use pebble 2 to answer the queries “is (cc, y) an edge 
in G^T'-, and, in general, pebble i to answer edge queries about G^'-^ . The 
recursion bottoms out at pebble k, which we use to answer queries of the form 
“is X connected to y by a path of length d in G?” . Each level answers the queries 
asked by the previous level by performing random walks with restarts on the 
corresponding graph. 

A crucial requirement for performing all the random walks with restarts is 
a counter that keeps track of time up to so that we know when to stop the 
random walk and report that there is no u-v path. However, this is not possible 
in logspace. So we use a probabilistic counter (as in [8], [16]), which waits for a 
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successive run of heads. We show that this counter works accurately enough to 
keep the errors small. 

An alternative way to represent this recursive pebbling strategy is to imagine 
that pebble 1 is performing a random walk on , and that it queries an oracle 
for the edges of G^ . This oracle is implemented by pebbles 2 to fc in a recursive 
manner. 



Small Space Clocks 

Consider a sequence of unbiased coin tosses (0 or 1). Consider a counter which 
counts the lengths of a successive runs of Is and stops at the first time it en- 
counter a run of length 1. We call this probabilistic counter Clock(2^). 

Lemma 1. Clock(2^) is a probabilistic procedure that runs in space log I for time 
T such that Pr\T > t] < e~ and Pr\T <f\ < ^ 

Proof. Let T denote the time when a run of Is of length I first appears. We first 
bound the probability that T is very large compared to 2K Divide the sequence 
into segments of length 1. Let Xi denote the random variable which is 1 if the 
segment contains all Is, and is 0 otherwise. It is clear that the XiS are 
independent Bernoulli random variables each with expected value 2~K Also, if 
T > t then = 0 for f = 1,..., |. Therefore Pr[T > t] < = 0]. 

Hence, by the Chernoff bound, we get 

Pr[T > t] < e~' 2 i+i (1) 

Next, a simple union bound shows that T is unlikely to be very small com- 
pared to 2h At any time, the probability that the next I coin tosses are all Is is 
2~K Hence 

Pr[T <t]<^ (2) 

□ 

Note that the probability that the clock runs for a long time is much smaller 
than the probability that the clock runs out very quickly. However, this suffices 
for our purposes. 

3 The Algorithm 

We describe a family of algorithms parameterized by a parameter k which takes 
values between 1 and logn. The input is a graph G on n vertices, and two 
vertices s and t. We set d = ut= . The algorithm is recursive. Each call is given 
input vertices u, v and a distance parameter d' which is an upper bound on the 
distance from u to v. Thus the first call is with vertices s, t and d' = n. Calls 
at the lowest level work with d' = d. Each call performs a random walk with 
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RWALK(u,v,d’) 




Start Clockinf"^) 
While {ClockirP'^) is 


ticking) 


Vq — U\ Vcur ~ 

for i = 1 to d 


[RESTART] 


'^prev — '^cur 




Vcur ^ [^] 




if d' >d 


/ / (recursive call) 


if RWALK(wprei), Wcur, y) = 0 then goto [RESTART] 


else 


//(bottom of recursion) 


if (vprev,Vcur) E{G) then goto [RESTART] 


end for 




if Vcur = V then Return 1 else goto [RESTART] 


end while 
Return 0 


/ / (Clock terminated) 



Fig. 1. RWALK(u,v,d’) 



restarts on the appropriate graph and is clocked by Clock(n^'^). The algorithm 
is given in Figure 1. 

We first show that the algorithm indeed terminates with high probability. 
The main technical difficulty here is the following. The running time of the 
algorithm is essentially the sum of the running times of all the random walks 
performed. Each random walk runs for as long as the associated Clock runs. If 
this clock runs too long, so does the random walk. Each step of the random walk 
invokes a new random walk at the lower level (i.e. for a smaller distance). Hence 
there is no a priori bound even on the total number of random walks performed, 
it depends on the length of each random walk. We prove our bound through 
induction. By level j of the algorithm we will mean the jth level of recursion in 
the algorithm. Random walks at level j are performed using pebble j. There are 
k levels in all. 

Lemma 2. The algorithm terminates in time with very high prohahility. 

Proof. We use induction on the level j to prove the following: With probability 
at least 1 — for all I < j, there are no more than random walks 

performed at the Ith level of the recursion, and each of these random walks runs 
for time less than 

The base case of the induction is true since only one random walk is performed 
at the first level and by Lemma 1 with probability at least 1 — e“" it does not 
run for time greater than Assume that the inductive hypothesis is true for 
level j. The number of random walks at level j ' + 1 is equal to the total number 
of steps performed in all the random walks at level j. Thus with probability at 
least 1 — no more than x n’^'^ = walks run at level j + 1. Given 
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this, the probability that any of these walks runs for more than nJ'^ is bounded 

7dj 

by Hence the statement is true for j -I- 1 and this concludes the induction. 

7dk 

Since there are k levels, with probability 1 — the total number of walks 

is bounded by and none of them runs for more than 

This implies that the algorithm runs in time , □ 

We now argue that with good probability the algorithm does indeed find an 
s-t path if one exists. 

Notice first that the algorithm only errs in one direction: if there is no 
s-t path the algorithm will certainly return 0. Now a call to RWALK (say 
RWALK(m, d')) at any level of the recursion essentially consists of two parts: 
while the clock runs it randomly chooses d vertices to perform the random walk 
between u and v, and it asks edge queries to the level below. Call a sequence of 
vertices xq = u,x\,X 2 , ■■■,Xd = v a, valid sequence if in the graph G there is a 
path of length d'/d from Xi to Xi+i for all i = 0, ..., d — 1. We prove first that at 
any call RWALK(t6, v, d') finds a valid sequence with high probability if there is 
a path of length d' from m to v in G. 

Lemma 3. If there is a u-v path of length d' in G, RWALK(u,v,d' ) finds a 
valid sequence with probability at least 1 — 

Proof. By Lemma 1, we know that Clock(n^‘^) runs for time at least with 
probability at least 1— The probability that RWALK(u, v, d') does not find a 
particular d length sequence in this time is at most ^ . So the error probability 
is at most □ 

Having RWALK(m, v, d') choose a valid sequence with high probability is not 
enough. We also need the calls at all lower levels to return correct answers. Here 
the main issue is that we are running about number of random walks, and 
the probability of error of each random walk is about . Hence clearly many 
of the walks will return incorrect answers. But the key is that for most instances, 
we do not care. We only require those queries which actually pertain to the s-t 
path to be answered correctly. For this we argue in a bottom up manner. For 
convenience, we define a notion of height of a pebble which is the inverse of the 
level. The pebble at level k is at height 1 whereas the pebble at level 1 is at 
height k. 

Lemma 4. If there is a s-t path in G, RWALK(s, t,n) will find it with probability 
at least 1 ^ . 

Proof. By induction on h, we prove that the pebble at height h answers any 
single query incorrectly with probability at most 

If h = 1, then this follows from Lemma 3 since finding a path between u and 
V is equivalent to finding a valid sequence (since at height 1 we have direct access 
to G, and do not need to make any edge queries). Assume that the statement 
holds for height h — Consider the pebble at height h which is asked if u is 
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adjacent to v. If a u-v path exists, then by Lemma 3, it guesses a valid sequence 
with probability at least 1 — It now makes d queries to the pebble at level 
ft, — 1. That pebble returns correct answers to all these d queries with probability 
at least 1 — d^-~g Hence the probability that pebble at height ft answers 
the query incorrectly is at most ^ + :^ = 

The overall failure probability of RWALK(s,t,n) is the probability that the 
pebble at height k answers the question “is there a path from s to t?” incorrectly. 
By the induction above, this is at most ^ ^ which goes to 0 asymptotically. 

□ 



We finally prove: 

Lemma 5. The algorithm runs in space O(fclogn) = 

Proof. There are k levels of recursion. At each level there is only one random 
walk running at any given time. Each level keeps a clock, a start and end vertex 
and two current vertices. So each level takes O(logn) storage. Hence the total 
storage is k log n. □ 

We have proved our main Theorem: 

Theorem 1. There is an algorithm that solves directed graph connectivity in 

d log^ n 1 2 i 

RTISP{2 for 2 < d < n or equivalently in RTISP{2" ,fclogn) 

for 1 < ft < log n. 

It is interesting to note that by setting d = 2, we perform random walks 
of length 2. This is equivalent to guessing the midpoint v between s and t and 
then the midpoint of each of these paths and so on. If instead of guessing, we 
try all possible vertices, this is precisely Savitch’s algorithm. When d = n we 
get the logspace random walk with restarts of [8] . Intermediate values of d give 
an algorithm with parameters in between. In particular, if d = log n, we get an 
algorithm that runs in space , " and time 

® ^ log log n 



4 Implementation on a Probabilistic NNJAG 

The NNJAG model is a restricted model of computation for directed graph 
connectivity. It consists of an automaton with a set of internal states q, a set 
of pebbles k, and a transition function S. A probabilistic NNJAG has access to 
random bits at each step and the transition function can depend on the bits 
read. At any time the NNJAG only knows the names of the nodes on which 
it has pebbles. It is allowed to move a pebble to an adjacent node, or to jump 
a pebble to a node on which there are already other pebbles. The space S is 
defined as k log n + log q. The time taken T is the number of moves made. 

We shall briefly give an informal description of how algorithm RWALK can be 
simulated on a probabilistic NNJAG [11]. The counter for the clock is maintained 
in the internal state of the NNJAG. Similarly at each step, the guess for the k 
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vertices on the path is stored with the internal state. We then move pebbles 
along the edge and compare them with the nodes stored in the internal state to 
see if we have reached the target node. 



Theorem 2. fl2j Any probabilistic NN JAG for STCONN on n node graphs that 
runs in expected time T and uses space S satisfies S > log log n+i^g log t ■ 



Theorem 3. For S < ; there exists a probabilistic NNJAG that meets 

the lower bound of Theorem 2 up to constant factors. 

Proof: Since the lower bound on S by Theorem 2 is less than or equal to 
loglo^n ’ hope to show our algorithm is tight only when the space is less 

than or equivalently when d = f?(logn). In Theorem 1, 



log log T = 6>(logd-l- loglogn) = 6?(logd) 

The bound implied by Theorem 2 for an NNJAG with this expected running 
time is 



s = n 



= n 



f log^ n 

\ log log n + log log T 

f\og^n\ 

\logd ) 



The space bound achieved by Theorem 1 matches this lower bound up to con- 
stant factors. □ 
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Abstract. Given a polygonal path P with vertices pi,p 2 , ■ ■ ■ ,Pn and a 
real number t > 1, a path Q = (pi -^ , pi^ , ■ ■ ■ , Pi ^ ) is a t-distance-preserving 
approximation of P if 1 = ii < i 2 < . . . < ife = n and each straight-line 
edge {Pij ,Pij^i) of Q approximates the distance between pi^ and Pi^^^ 
along the path P within a factor of t. We present exact and approxi- 
mation algorithms that compute such a path Q that minimizes k (when 
given t) or t (when given k). We also present some experimental results. 

1 Introduction 

Let P be a polygonal path through the sequence of points pi,p 2 , ■ ■ ■ ,Pn- We 
consider the problem of approximating P by a “simpler” polygonal path Q. Imai 
and Iri [9,10] introduced two different versions of this problem. In the first one, 
we are given an integer k and want to compute a polygonal path Q that has k 
vertices and approximates P in the best possible way according to some measure 
that compares P and Q. In the second version, we are given a tolerance e > 0 
and want to compute a polygonal path Q that approximates P within e and has 
the fewest vertices. Both versions have been considered for different measures 
that are based on variations of the notion of minimum distance between P and 
Q; see [1,2,4,5,7,8,9,10,11,12]. 

These problems have many applications in map simplification. In this paper, 
we consider distance-preserving approximations of polygonal paths. Distance- 
preserving simplifications are particularly meaningful for a path representing 
a meandering river or a winding road; the approximations simplify such paths 
without substantially distorting the length and distance information. Glearly 
there is a trade-off between how simple Q is made and how closely distances in 
Q reflect those in P. We now define our novel concept more precisely. 

We denote the Euclidean distance between any two points p and q in the plane 
by \pq\- For any two vertices pi and pj of P, let S{pi,pj) denote the Euclidean 

* Gudmundsson was supported by NWO, Smid was supported by NSERC. 



P.K. Pandya and J. Radhakrishnan (Eds.): FSTTCS 2003, LNCS 2914, pp. 217-228, 2003. 
© Springer- Verlag Berlin Heidelberg 2003 




218 



J. Gudmundsson, G. Narasimhan, and M. Smid 



distance between pi and pj along P, i.e., 6{pi,pj) = \PtPt+i\- Let t > 1 

be a real number. We say that a path Q = {pi^,Pi^, ■ ■ ■ ,Pif.) is a t-distance- 
preserving approximation of P if (i) 1 = ti < < . . . < ifc = n, and (ii) 

HPij^Pij+i) ^ t\PijPij+i\ for all j with 1 < j < fc. Hence, the straight-line edge 
(Pij^Pij+i) approximates the distance between pi^ and along the path P 

within a factor of t. The following two problems will be considered in this paper. 

Problem 1. Given a polygonal path P with n vertices and a real number t > 
1, compute a t-distance-preserving approximation of P having the minimum 
number of vertices. 



Problem 2. Given a polygonal path P with n vertices and an integer k with 
2 < k < n, compute the minimum value of t for which a t-distance-preserving 
approximation of P having at most k vertices exists. 

We start in Section 2 by giving simple algorithms that solve Problems 1 
and 2 in O(n^) and 0(n^ log n) time, respectively. In the rest of the paper, we 
consider approximation algorithms for both problems. In Section 3, we introduce 
a heuristic algorithm for Problem 1 that uses Gallahan and Kosaraju’s well- 
separated pair decomposition [3] . We use this decomposition to define a directed 
graph having O(n^) edges that can be represented implicitly in 0{n) space. This 
graph has the property that a simple shortest path computation (implemented 
using breadth- first search) gives an “approximation” to Problem 1. To be more 
precise, the main result in Section 3 is the following. Given real numbers t > 1 
and 0 < e < 1/3, let n be the minimum number of vertices on any t-distance- 
preserving approximation of P. In 0(nlogn-|- {t/e)n) time, we can compute a 
((1-1- e)t)-distance-preserving approximation Q of P, having at most k vertices. 
In other words, our heuristic may result in an approximation Q of P that is 
distance-preserving for a slightly larger value of t than desired. In Section 4, we 
give an approximation algorithm for Problem 2. That is, we use the result of 
Section 3 and the well-separated pair decomposition to design a simple binary 
search algorithm that computes, in 0{{t* /e)n log n) time, a real number t such 
that t <t* < {l + e)t, where t* is the exact solution for Problem 2. In Section 5, 
we present some experimental results. 



2 Simple Exact Algorithms 

Gonsider again the polygonal path P = {pi,p 2 , ■ ■ ■ ,Pn), and let t > 1 be a real 
number. For any two indices i and j with 1 < f < j < n, we say that the ordered 
pair (pi,pj) is t- distance-preserving if S{pi,pj) < t\piPj\. 

Gonsider the directed graph Gt with vertex set {pi,P 2 , ■ ■ ■ ,Pn} and edge 
set the set of all t-distance-preserving pairs of vertices. It is clear that any t- 
distance-preserving approximation of P having k vertices corresponds to a path 
in Gt from pi to having k — 1 edges, and vice versa. (Observe that (pi,pi+i) 
is an edge in Gt for each i with 1 < i < n. Therefore, there exists a path in 
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Gt from Pi to p„.) It follows that Problem 1 can be solved by constructing the 
graph Gt and computing a shortest path from pi to p„. The latter can be done 
by performing a breadth-first search in Gt using pi as the source vertex. Hence, 
Problem 1 can be solved in a time that is proportional to the sum of the number 
of vertices and edges in Gt- Since the latter is O(n^), we obtain a time bound of 
O(n^). 

Theorem 1. Given a polygonal path P with n vertices and a real number t > 
1, a t- distance-preserving approximation of P having the minimum number of 
vertices can be computed in 0{n^) time. 

We now show that Theorem 1 can be used to solve Problem 2. Consider 
again the graph Gt defined above. Let Kt be the minimum number of vertices on 
any t-distance-preserving approximation of P. If t and t' are real numbers with 
t' > t > 1, then Kt' < Kt, because Gt is a subgraph of Gt'. Problem 2 asks for 
the smallest real number t > 1 such that Kt < k. We denote this value of thyt*. 
For any two indices i and j with 1 < t < j < n, let t*j := S{pi,pj) /\piPj\. The 
family {Gt)t>i consists of the ( 2 ) graphs Gt with t G G := {t*j ■ I < i < j < n}. 
Moreover, t* G C. Therefore, if we sort the elements of G and perform a binary 
search in the sorted set {Kt : t € C}, we obtain a solution for Problem 2. Using 
Theorem 1, it follows that the running time is O(n^logn). 

Theorem 2. Given a polygonal path P with n vertices and an integer k with 2 < 
k <n, the minimum value of t for which a t-distance-preserving approximation 
of P having at most k vertices exists can be computed in 0{n^ log n) time. 

3 A Heuristic Based on Well-Separated Pairs 

In this section, we introduce a heuristic approach for solving Problem 1 that uses 
the well- separated pair decomposition of Callahan and Kosaraju [3]. We briefly 
recall this decomposition in Section 3.1. In Section 3.2, we describe the idea of 
our heuristic algorithm, analyze its output, and give a condition under which 
it solves Problem 1 exactly. In Section 3.3, we show how the heuristic can be 
implemented such that it runs in O(nlogn) time. 

3.1 Well- Separated Pairs 

Definition 1. Let s > 0 be a real number, and let A and B be two finite sets of 
points in We say that A and B are well-separated with respect to s, if there 
are two disjoint balls Ga and Gb, having the same radius, such that Ga contains 
A, Gb contains B, and the distance between Ga and Gb is at least equal to s 
times the radius of G a- We will refer to s as the separation ratio. 

Lemma 1. Let A and B be two sets of points that are well- separated with respect 
to s, let X and x' be points of A, and let y and y' be points of B. Then \xx'\ < 
{2/s)\x'y'\ and \x'y'\ < (1 -k 4/s)|a;j/|. 
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Definition 2 ([3]). Let S be a set of points in and let s > 0 be a real 
number. A well-separated pair decomposition (WSPD) for S (with respect to s) 
is a sequence {Ai,Bi\, 1 < i < m, of pairs of non-empty subsets of S, such 
that (i) Ai C\ Bi = % for all i = 1,2, ... ,m, (ii) for each unordered pair {p,q\ 
of distinct points of S, there is exactly one pair {Ai,Bi\ in the sequence, such 
that (1) p € Ai and q G Bi or (2) p G Bi and q G Ai, (Hi) Ai and Bi are 
well-separated with respect to s, for all i = 1,2, ... ,m. 

Callahan and Kosaraju show how such a WSPD can be computed. They 
start by constructing, in 0(n log n) time, a split tree T having the points of S 
at its leaves. Given this tree, they show how a WSPD with m = 0{s‘^n) can be 
computed in O(s'^n) time. In this WSPD, each pair {Ai,Bi} is represented by 
two nodes Ui and Vi of T. That is, Ai and Bi are the sets of all points stored at 
the leaves of the subtrees rooted at Ui and Vi, respectively. 

Theorem 3 ([3]). Let S be a set of n points in and let s > 0 be a real 
number. A WSPD for S (with respect to s) with m = 0{s‘^n) can be computed 
in 0{n log n -\- s'^n) time. 

3.2 The Idea of the Heuristic 

Consider again the polygonal path P = {pi,p 2 , . . . ,Pn)- We embed P into one- 
dimensional space by “flattening” it out: For each i with 1 < i < n, let Xj be 
the real number given by Xi = 6{pi,pi), and let S := {xi,X 2 , • ■ • , Xn}- 

Let s > 0 be a given separation ratio, and consider the split tree T and the 
corresponding WSPD {Ai,Bi}, 1 < i < m, for S. We may assume without loss 
of generality that any element in Ai is smaller than any element in Bi. 

The following lemma shows that, for large values of s, all pairs (p, q) of 
vertices of P such that S{pi,p) G Ai and 6{pi,q) G Bi are distance-preserving 
for approximately the same value of t. 

Lemma 2. Let p, p' , q, and q' be vertices of P, and let i be an index such that 
X := 5{pi,p) G Ai, x' := S(pi,p') G Ai, y := S{pi,q) G Bi, and y' := 6{pi,q') G 
Bi. Let t > 1 be a real number such that 4st -\- 16t < s'^ . Lf the pair {p,q) is 
t- distance-preserving, then the pair (p' ,q') is t' -distance-preserving, where 

, _ (1 -I- 4/ s)t 

1 — 4(1 -I- 4/s)t/s' 

Proof. First observe that, by the condition on t, the denominator in t' is positive. 
By applying Lemma 1 and the triangle-inequality, we obtain 

= \x'y'\ < {l + 4/.s)\xy\ = (1-f 4/s) -(5(^,9) 

< (1 + ^/s)t\pq\ < (1 + 4/s)t ■ {\pp'\ + \p'q'\ + \q'q\) 

< (1-f 4/s)f- (<5(p,p') + \p'q'\ + S{q',q)) 

= (l + 4/s)t-{\xx'\ + \p'q'\ + \y'y\) 
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< (1 + 4/s)t • {{2/s)\x'y'\ + \p'q'\ + {2/s)\x'y'\) 

= {l + A/s)t-{{A/s)5{p',q') + \p'q'\) 

= (4(1 + 4/s)t/s) • 5{p', q') -h (1 -h A/s)t\p'q'\. 

Rearranging terms yields S{p',q') < t'\p'q'\. □ 

Let t > 1 and 0 < e < 1/3 be real numbers, and let 

12 + 24(l + e/3)t _ 

Lemma 3. Let p, p' , q, and q' he as in Lemma 2. Lf the pair {p, q) is t-distance- 
preserving, then the pair {p',q') is ((1 -I- e/3)t)-distance-preserving. Lf the pair 
(p,q) is {{l+e/3)t)-distance-preserving, then the pair {p' , q') is {{l+e)t)- distance- 
preserving. 

For each i with 1 < z < m, let and hi be fixed elements of Ai and Bi, 
respectively, and let fi and gi be the vertices of P such that at = S{pi,fi) and 
hi = 5{pi,gi). We say that the ordered pair is {t,e)- distance-preserving 

if the pair {fi, gi) is ((1 -I- e/3)t)-distance-preserving. We define a directed graph 
H whose vertices are the 2m sets Ai and Bi, \ < i < m. The edges of H are 
defined as follows. For any i with 1 < z < m, (Ai, Bf) is an edge in H if {Ai, Bi) 
is (t, e)-distance-preserving and x„ G Bi. For any i and j with 1 < z < to and 
1 < j < TO, {Ai,Aj) is an edge in H if {Ai,Bi) is (f, e)-distance-preserving and 
Aj Ci Bi^%. 

Let Q = {qi,q 2 , . . . ,qk) be an arbitrary t-distance-preserving approximation 
of the polygonal path P. We will show that Q corresponds to a path in H from 
some set Ai that contains x\ to some set Bj that contains Moreover, this 
path in H consists of k vertices. 

For each z with 1 < z < fc, let yi be the element of S such that yi = 6{p\, qi). 
Recall that q\ = p\ and, therefore, y\ = x\. Let i\ be the index such that y\ € Ai,^ 
and z /2 G Bi^ . The path in H corresponding to Q has Ai^ as its first vertex. Let £ 
be such that 1 < £ < k — 2 and assume we have already converted {qi, q 2 , . . . , qi) 
into a path {Ai.,^, Ai^, . . . , Ai^) in H, where yi G and ye+i G Bi^. Let z^+i be 
the index such that ye+i G and ye +2 G Bi^_^_„^. Since {qi,qi+i) is t-distance- 
preserving, and since yi € Ai^ and z/^+i G Bi^, it follows from Lemma 3 that the 
pair {Aij,,Bi^) is (t, e)-distance-preserving. Moreover, Aij,_i_^ D Bi^ ^ 0, because 
yt+i is in the intersection. Therefore, is an edge in H, i.e., we have 

converted {qi,q 2 , . . . ,qt+i) into a path . ■ • , in H, where yi+i G 

and yi +2 G . We continue extending this path until we have converted 
{qi,q 2 ,.. .,qk-i) into a path . . . , Ai^^_f) in H, where yu-i G Ai,^_^ and 

yk G Observe that cc„ = 6{pi,qk) = yk & Bi^_^. Also, since {qk-i,qk) 

is t-distance-preserving, it follows from Lemma 3 that the pair is 

(t, e)-distance-preserving. Therefore, {Ai^_^,Bi^_f) is an edge in H. 

Hence, any t-distance-preserving approximation Q = {qi,q 2 , . . . ,qk) of P 
corresponds to a path . . . , Aij,_^ , J in H, where A^^ contains x\ 

and Bi^_^ contains 
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What about the converse? Let {Ai^, Ai^, . . . , be a path in H 

such that Xi G A^^ and Xn G We will convert this path into a polygonal 

path Q from pi to Let qi := pi, yi := Xi, and let Q be the path consisting 
of the single vertex qi. 

Let £ be an integer such that 1 < £ < k — 2, and assume we have already 
converted {Ai^ , Ai^ , • ■ • , Ai ^^ ) into a polygonal path Q = {qi,q 2 , . . . ,qi) such that 
yi G Ai^ and yj := 5{p\,qj) G Ai. fl Bi._^ for each j with 2 < j < £. Consider 
the edge {Ai^,Ai,,_^^) in H. We know that Ai^^^ fl Bi^ yf 0. Let yi+i be an 
arbitrary element of fl and let qi+\ be the vertex of P for which 

yi+i = ^{Pii ®+i)- We extend Q by the vertex qi+\. 

We continue adding vertices to Q until we have converted 
{Ai^,A^^,...,Ai^_^) into a polygonal path Q = {qi,q 2 , . . . ,qk-i) such that 
yi G Ai^ and yj := S{pi,qj) G Ai. r\Bi._^ for each j with 2 < j < k—1. Consider 
the last edge {Ai^_^, Bi^._^) of the path in H. We know that Xn G Bi^ .^. Let 
qk ■= Pn and yk := x„- Then yk = 6{pi,qk) G Bi^_^. We add the vertex qk to 
Q, which completes the polygonal path between pi and 

In conclusion, we have converted the path (Ai^ , Ai ^ , • ■ • , in H, 

where X\ G Ai^ and G Bi^_^, into a polygonal path Q = (gi, 52, ■ • ■ , qk) such 
that (i) y^=xi = 5{pi,qi) G A,^, (ii) yj = 5{pi,qj) G Ai. n Bi._^ for all j with 
“2< j <k-l, and (iii) yk = Xn = S{pi,qk) G Bi^_^. 

Unfortunately, Q need not be a t-distance-preserving approximation of P. 
We claim, however, that Q is a ((1 + e)t)-distance-preserving approximation of 
P. To prove this, we let j be an arbitrary index with 1 < j < A: — 1. Observe that 
yj G Ai^ and Pj+i G Bi^. Since the pair {Ai.^Bi.) is (t, e)-distance-preserving, it 
follows from Lemma 3 that the pair {qj, qj+i) is ((1 + e)t)-distance-preserving. 

Theorem 4. Let P = {pi,p 2 , ■ ■ ■ ,Pn) be « polygonal path, let t > 1 and 0 < 
e < 1/3 be real numbers, let x\ = 6{pi,pi) and Xn = S(pi,pn), and let H be the 
graph as defined above, where the separation ratio is given by (1). 

1. Any t-distance-preserving approximation of P consisting of k vertices corre- 
sponds to a path in H from some set containing x\ to some set containing 
Xn that consists of k vertices. 

2. Any path in H from some set containing x\ to some set containing x„ that 
consists of k vertices corresponds to a {{1 e)t)-distance-preserving approx- 
imation of P consisting of k vertices. 

3. Let K be the minimum number of vertices on any t-distance-preserving ap- 
proximation of P, and let R be a shortest path in H between any set con- 
taining Xi and any set containing a;„. Then R corresponds to a ((1 + e)t)- 
distance-preserving approximation of P consisting of at most k vertices. 

4 . Assume that for any two distinct vertices p and q of P, S{p, q)/\pq\ < t 
or S{p,q)/\pq\ > (1 + e)t. Then R corresponds to a t-distance-preserving 
approximation of P consisting of k vertices. 



3.3 Implementing the Heuristic 

The results in Section 3.2 imply that we can solve Problem 1 heuristically by 
computing a shortest path in the graph H between any set Ai that contains 
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xi and any set Bj that contains Such a shortest path can be found by a 
breadth-first search computation. The problem is that the graph H can have up 
to 0{n^) edges. We will show, however, that we can run (a partial) breadth-first 
search without explicitly constructing the entire graph H . The main idea is to 
use the fact that the vertices of H correspond to nodes of the split tree T. 

Let i and j be two indices such that {Ai,Aj) is an edge in the graph H. 
Then, by definition, Aj C\ Bi yf 0. Since Aj and Bi are represented by nodes of 
the split tree, it follows that Aj C Bi (in which case the node representing Aj is 
in the subtree of the node representing Bi) or Bi C Aj (in which case the node 
representing Bi is in the subtree of the node representing Aj). 

We are now ready to present the algorithm. The input is the polygonal path 
P = {pi,p 2 , ■ ■ ■ ,Pn) and real numbers t > 1 and 0 < e < 1/3. The output will 
be a ((1-1- e)t)-distance-preserving approximation of P. 

Preprocessing: 

Step 1: Compute the set S = {xi, X 2 , ■ ■ ■ , x„}, where Xi = S{pi,pi). Step 2: 
Compute the split tree T and the corresponding WSPD {Ai,Bi}, 1 < i < m, 
for S with separation ratio s = (12 -|- 24(1 -|- e/3)t)/e. Step 3: For each i with 
1 < i < TO, let tti and bi be arbitrary elements in Ai and Bi, respectively. Let fi 
and Pi be the vertices of P such that ai = 5{pi, fi) and bi = S{pi,gi). If {fi, pi) is 
((1 -I- e/3)t)-distance-preserving, then keep the pair {Ai,Bi}; otherwise discard 
it. 

For simplicity, we again denote the remaining well-separated pairs by 
{Ai,Bi}, 1 < z < TO. It follows from Theorem 3 that to = 0{sn) = 0{{t/e)n) 
and that the preprocessing stage takes time O(nlogn-l-sn) = 0(nlogn-|-(t/e)n). 
For each i with 1 < z < to, we denote by Ui and Vi the nodes of the split tree 
T that represent the sets Ai and Bi, respectively. We designate the nodes Ui as 
A-nodes. We will identify each set Ai with the corresponding node Ui and each 
set Bi with the corresponding node Vi. Hence, the graph P[ in which we perform 
the breadth-first search will have the nodes of the split tree as its vertices. Ob- 
serve that for a node w of T, there may be several indices z and several indices 
j such that Ui = w and Vj = w. 

The implicit breadth-first search algorithm: Our algorithm will be a mod- 
ified version of the breadth-first search algorithm as described in Cormen et 
al. [6] . It computes a breadth-first forest consisting of breadth-first trees rooted 
at the H-nodes Ui for which X\ G Ai. The breadth-first search terminates as 
soon as an H-node Ui is reached such that Xn G Bi. For each node w of the split 
tree, the algorithm maintains three variables: (i) color{w), whose value is either 
white, gray, or black, (ii) dist{w), whose value is the distance in H from any set 
Ai containing xi to the set corresponding to w, as computed by the algorithm, 
and (iii) tt{w), whose value is the predecessor of w in the breadth-first forest. 
Step 1: For each node w of T, set color{w) := white, dist{w) := oo, and 
tt{w) := nil. 

Step 2 : Initialize an empty queue Q. Starting at the leaf of T storing x\, walk 
up the tree to the root. For each node w encountered, set color{w) := gray and, 
if w is an H-node, set dist{w) := 0, and add w to the end of Q. 
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Step 3 : Let w be the first element of Q. Delete w from Q and set color{w) := 
black. For each index i such that Ui = w, do the following. If € Bi, then set 
dist{vi) := dist{w) + 1, := w, z := Vi, and go to Step 4. If ^ Bi and 

color{vi) = white, then perform Steps 3.1 and 3.2. 

Step 3 . 1 : Starting at node vt, walk up the split tree until the first non-white 
node is reached. For each white node w' encountered, set color{w') := gray and, 
if w' is an ^-node, set dist{w') := dist{w) + 1, t:{w') := w, and add w' to the 
end of Q. 

Step 3 . 2 : Visit all nodes in the subtree of Vi. For each node w' in this subtree, set 
color{w') := gray and, if w' is an A-node, set dist{w') := dist{w) + l, := w, 

and add w' to the end of Q. 

After all indices i with m = w have been processed, go to Step 3. 

Step 4: Compute the path (z, 7r(z), 7r^(z), . . . , 7 t^“^(z)) of nodes in T, where 
k = dist{z) + 1. 

Step 5 : Use the algorithm of Section 3.2 to convert the path obtained in Step 4 
into a polygonal path. 

Observe that, if w' is the first non- white node reached in Step 3.1, all nodes on 
the path from w' to the root of the split tree are non-white. Also, if color(vi) = 
white, then all nodes in the subtree of Vi (these are visited in Step 3.2) are white. 
Using these observations, an analysis similar to the one in Cormen et al. [6] shows 
that the path obtained in Step 4 is a shortest path in H between any set Ai 
containing x\ and any set Bj containing Hence, by Theorem 4, the polygonal 
path obtained in Step 5 is a ((1 -I- e)t)-distance-preserving approximation of the 
input path P. To estimate the running time of the algorithm, first observe that 
both Steps 1 and 2 take 0{n) time. Steps 4 and 5 both take 0{n) time, because 
the path reported consists of at most n nodes. It remains to analyze Step 3. 
The total time for Step 3 is proportional to the sum of m and the total time 
for walking through the split tree T in Steps 3.1 and 3.2. It follows from the 
algorithm that each edge of T is traversed at most once. Therefore, Step 3 takes 
0{m + ri) time. We have shown that the total running time of the algorithm is 
0{m + n) = 0(sn) = 0((t/e)n). 

Theorem 5 . Let P = {pi,p 2 , ■ ■ ■ ,Pn) be a polygonal path, let t > 1 and 0 < 
e < 1/3 be real numbers, and let k be the minimum number of vertices on any 
t-distanee-preserving approximation of P. 

1. In 0{n\ogn + {t/e)n) time, we can compute a {{1 + e)t)-distance-preserving 
approximation Q of P, having at most k vertices. 

2. If 5{p, q)/\pq\ < t or 5{p, q)/\pq\ > (1 + e)t for all distinct vertices p and q of 
P, then Q is a t-distance-preserving approximation of P having n vertices. 

4 An Approximation Algorithm for Problem 2 

Recall that, for any real number t> 1, we denote by k* the minimum number 
of vertices on any t-distance-preserving approximation of the polygonal path 
P = {pi,p 2 , . . . ,Pn)- Let k he & fixed integer with 2 < k < n, and let t* := 
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min{f: > 1 : kj < fc}. In this section, we present an algorithm that computes 
an approximation to t* . Our algorithm will perform a binary search, which is 
possible because of the following lemma. 

Lemma 4. Let t > 1 and 0<e<l/3 6e real numbers, let Q be the output of 
the algorithm of Theorem 5, and let k' be the number of vertices of Q. If k' < k, 
then t* < (1 -I- e)t. If k' > k, then t* > t. 

We first run a standard doubling algorithm to compute a real number t that 
approximates t* within a factor of two. To be more precise, starting with t = 2, 
we do the following. Run the algorithm of Theorem 5 and let k' be the number of 
vertices in the output Q of this algorithm. If k' > k, then repeat with t replaced 
by 2t. If k' < k, then terminate, set r := t, and return the value of t. It follows 
from Lemma 4 that 

t/2 < < (1 -I- e)r. (2) 

Observe that the algorithm of Theorem 5 computes, among other things, a 
split tree T and a WSPD with a separation ratio s that depends on t and e. 
Moreover, observe that T does not depend on s. Hence, it suffices to compute 
the split tree only once. Therefore, it follows from Theorem 5 that, when given 
T, the time to compute r is {2^ e)n) = 0{{T/e)n) = 0{{t* /e)n). 

We now show how to use binary search to compute a better approximation. 
Let S = {xi,X 2 , ■ ■ ■ ,Xn}, where Xi = 6{pi,pi), 1 < i < n, and let {Ai,Bi}, 
1 < i < m, he the WSPD of S with separation ratio s = (4 -|- 8(1 -I- e)^T)/e. 
For each i with 1 < i < m, let and bi be arbitrary elements of Ai and Bi, 
respectively, let fi and gi be the vertices of P such that Oi = S{pi,fi) and 
bi = S{pi,gi), and let ti := S{fi, gi)/\figi\. The following lemma states that, in 
order to approximate t*, it suffices to search among the values ti, 1 < i < m. 

Lemma 5. There exists an index j with 1 < j < m, such thattj/{l + e) <t* < 
(1 -I- e)tj. 

Proof. We only prove the first inequality. We have seen in Section 2 that there 
exist two distinct vertices p and g of P such that t* = 6{p, q)/\pq\. Let j be the 
index such that 5{p\,p) G Aj and 5{p\,q) € Bj. 

Since the pair (p,q) is t*-distance-preserving, and since 4st* -|- 16t* < we 
know from Lemma 2 that the pair (fj,gj) is F-distance-preserving, where 

{l + 4/s)t* 

1-4{1 + A/ s)t*/ s' 

Since s > 4, we have t' < (1 -I- 4/s)t*/(l — 8t* /s). By our choice of s and by (2), 
we have s > (4 -I- 8(1 -I- e)t*)/e, which is equivalent to (1 -I- 4/s)t*/(l — 8t*/s) < 
(1 -I- e)t*. This proves that tj = d{fj,gj)/\fjgj\ < t' < (1 -I- e)t*. □ 

We proceed as follows. Define := 1) sort the values ti, 0 < i < m, remove 
duplicates, and discard those values that are larger than (l-l-e)^r. For simplicity, 
we denote the remaining sorted sequence by 

1 = to < fi < ^2 < • ■ • < tm < (1 + e)^r. 



(3) 
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We perform a binary search in this sorted sequence, maintaining the following 
invariant. 

Invariant: i and r are integers such that 0 < £ < r < m and ti < t* < {1 + e)tr- 

Initially, we set £ := 0 and r := m. To prove that at this moment, the 
invariant holds, consider the index j in Lemma 5. Observe that since tj < (1 + 
e)t* < (1 + e)^r, the value tj occurs in the sorted sequence (3). Therefore, 
= < {l + e)tj < (1 + e)tr. 

Assume that £ < r — 1. Then we use Lemma 4, with t = th where h = 
[{£ + r) /2J , to decide if t* < (1 + e)th or t* > th- In the first case, we set r := h, 
whereas in the second case, we set £ := h. Observe that, in both cases, the 
invariant is correctly maintained. 

We continue making these binary search steps until £ = r — 1. At this moment, 
we have te <t* < {1 + e)te+i. We now use Lemma 4, with t = (1 + e)te, to decide 
if < (1 + or t* > (1 + e)t(. In the first case, we return the value te, 
which satisfies te < t* < {1 + e)‘^te- Assume that t* > {1 + e)te- We claim that 
t* > + e). This will imply that t^+i/(l + e) < t* < (1 + e)te+i and, 

therefore, we return the value + e). To prove the claim, consider again 

the index j in Lemma 5. We have tj > t* /{I + e) > te and thus tj > te+i- It 
follows that t* > tj/{l + e) > + e). 

We have shown that the algorithm returns a value t such that t < t* < 
(1 + e)^t. If we run the entire algorithm with e replaced by e/3, then we obtain 
a value t such that t < t* < (1 + e/3)^t < (1 + e)b 

Theorem 6. Let P = {pi,p 2 , ■ ■ ■ ,Pn) be « polygonal path, let k be an integer 
with 2 < k < n, let t* be the minimum value oft for which a t-distance-preserving 
approximation of P having at most k vertices exists, and let Q < e < 1. In 
0{{t* / e)n\ogn) time, we can compute a real numbert such that t <t* < (l+e)t. 



5 Experimental Results 

In this section, we will briefly discuss some experimental results that we obtained 
by implementing the algorithms presented in Sections 2 and 3. The experiments 
were done by running the programs on paths containing between 100 and 50,000 
points. The shorter paths are from the Spanish railroad network and the longer 
paths (more than 3,000 points) were constructed by joining several shorter paths. 

The exact algorithm: First we consider the results obtained by running the exact 
algorithm on the input paths with different values of t. The most striking result 
is that the running times and numbers of edges seem to be independent of t. The 
running time of the algorithm was between 94 and 98 seconds for an input path 
containing 20,000 points for different values of t. Even though one might expect 
that the algorithm would not be heavily dependent on t, it is surprising that 
the difference is so small. The explanation is probably that the time to perform 
a breadth-first search depends on the length of the optimal solution (the depth 
of the search tree) and the number of t-distance preserving edges (affecting the 
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Input path (430 pts) 



= 1.05 and e = 0.05 



= 1.2 and e = 0.05 



Fig. 1. The topmost figure is the original path with 430 points, in the middle and at 
the bottom we have two simplifications containing 126 and 22 points obtained from 
the heuristic using t = 0.05, and t = 1.05 and t = 1.2, respectively. 



width of the tree). If t is large, the number of t-distance preserving edges is large 
and hence the width of the search tree is large, whereas if t is small, the optimal 
solution is long and hence the search tree is deep (but not very wide). 

The heuristic: Just as for the exact algorithm, the running time of the heuristic 
is not sensitive to t, for reasons similar to the ones discussed above. On the other 
hand, the running time decreases when e is increasing, since the number of well- 
separated pairs decreases. In the tests we performed, the number of pairs and 
the running time increased between two and four times when e was decreased 
from 0.05 to O.OI (for instances containing more than 2,000 points). 

The well-separated pair decomposition allows us to disregard the majority 
of the edges in the breadth- first search, which is the reason why the heuristic 
is faster than the exact algorithm. However, this “pruning” step is quite costly. 
Comparing the construction of the well-separated pair decomposition with the 
breadth-first search shows that the former uses almost 98% of the total running 
time. Is there a way to perform the pruning more efficiently? 

Comparing the algorithms: The table below shows some of the running times 
(in seconds) of three experiments using t = 1.1: The exact algorithm and 
the heuristic with e = 0.01 and e = 0.1. It is clear from the table that 
the exact algorithm quickly becomes impractical when the size of the input 
grows. Processing 50,000 points takes approximately 600 seconds for the exact 
algorithm while the same number of points can be processed in 43 and 11 
seconds, respectively, by the heuristic. Plotting the running-times clearly shows 
a difference in their asymptotic behavior which is obviously due to the use of 
the well-separated pair decomposition which, as mentioned above, “prunes” the 
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search tree. For example, when the input path consisted of 50,000 points, the 
exact algorithm “looked” at almost 1.2 billion edges, while the well-separated 
pair decomposition removed all but 36 million edges (for e = 0.01) and 10 
million edges (for e = 0.1). For 20,000 points the numbers were 196 millions 
versus 16 millions and 4 millions. This corroborates the power of using the 
well-separated pair decomposition for this kind of problems. 



ff Points 


100 


500 


2,000 


8,000 


20,000 


50,000 


Exact 


< Is 


< Is 


Is 


16s 


98s 


617s 


Heuristic (e = 0.01) 


< Is 


< Is 


3s 


13s 


20s 


43s 


Heuristic (e = 0.1) 


< Is 


< Is 


Is 


3s 


5s 


11s 
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Abstract. We propose a natural scheme to measure the (so-called) joint 
separation of a cluster of objects in general geometric settings. In partic- 
ular, here the measure is developed for finite sets of planes in R® in terms 
of extreme configurations of vectors on the planes of a given set. We prove 
geometric and graph-theoretic results about extreme configurations on 
arbitrary finite plane sets. We then specialize to the planes bounding a 
regular polyhedron in order to exploit the symmetries. However, even 
then results are non-trivial and surprising - extreme configurations on 
regular polyhedra may turn out to be highly irregular. 



1 Introduction 

In [3] the question arose that if three vectors are arbitrarily chosen, lying one 
each on the three co-ordinate planes, what is the largest angle 6 such that, in all 
instances, at least two of the vectors have an angle of at least 0 between them. 
The answer of tt/ 3 is not hard to see given the symmetries of the co-ordinate 
planes. It is natural to extend the question to arbitrary finite sets of planes. This 
leads to a measure of joint separation of such sets in terms of certain extreme 
configurations of vectors on the planes belonging to a given set. The measure, in 
fact, generalizes in a natural manner to clusters of objects in various geometric 
settings. As far as we are aware, such a measure of joint separation has not been 
investigated before. 

In Section 2 we give our definition of a measure of joint separation of a 
finite set of planes in Computing this measure for arbitrary finite plane 
sets in seems hard but we prove generally applicable geometric and graph- 
theoretic results. We specialize in Section 3 to the planes bounding a regular 
polyhedron in order to exploit the symmetries. Even then results are non-trivial 
and surprising - extreme configurations on regular polyhedra may turn out to 
be highly irregular. 

Other than the case of the co-ordinate planes mentioned above we do not 
yet know of applications of the joint separation of sets of planes. However, reg- 
ular polyhedra are geometric objects of such fundamental importance that this 
measure for their bounding faces we hope is of interest of itself. Computational 
techniques developed in this context are novel as well. 
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The notion of joint separation is natural enough that one expects applications 
in various domains. In fact, in Section 4 we formulate a general definition of the 
measure and implement it in different settings including practical ones related, 
for example, to databases and facility location. 

Several interesting questions for further study arise throughout. 



2 Planes 

Definitions. The containing space is assumed always to he 

Let T’ be a finite set of planes of cardinality n > 2. A spanning configuration 
of vectors (or, simply, configuration) C on F is a set of n non-null vectors, lying 
one each on a plane in F; in other words, there is one-to-one correspondence 
between C and F so that each vector in C lies on its corresponding plane in F. 
Parallel-translates of a given plane or vector are considered identical, and both 
F and C are allowed to contain duplicates. See Figures 1(a) and (c). 

The angle between a pair of non- null vectors u and v is Zuv = cos“^ \u\\v\ ’ 
where 0 < Z.uv < tt. 

Given a configuration C on F , the maximal angle of the configuration is the 
maximum amongst angles between pairs of vectors from C, denote it by max(C). 
E.g., in Figure 1(a), it is 27 t/3. Remark: In determining max{C) the length of 
each vector is immaterial - particularly, any vector in C may be replaced by a 
parallel one (i.e., a positive multiple) - it is directions that matter. The maximal 
angle is evidently the diameter of C using an appropriate metric. 

An extreme configuration on F is a configuration C' such that 

max{C') = inf{maa;(C') : G is a configuration on F}. 

Straightforward compactness and continuity arguments show that extreme con- 
figurations exist as F is finite. If C is an extreme configuration on F, call 
max(C') the minmax angle of F, denote it by minmax(F). 

Therefore, a defining property of minmax{F) is that it is the largest angle 6 
such that, if n vectors are chosen, one each on a plane of F, then at least two 
of the vectors have an angle of at least 0 between them. It is a measure of joint 
separation of F. 

If the startpoint of a unit vector u is located at the origin then the endpoint 
of u lies on the unit sphere S^. Since u is uniquely identified by this endpoint 
we shall not distinguish between unit vectors and points on S^. A unit vector 
trajectory (or, simply, trajectory) is a regular C°° curve (see [5]) c : [a, 6] — >■ 

{b > a) - the trajectory starts at c{a) and ends at c{b). If p is a plane such that 
c{t) € p,Vt € [a, b], then the trajectory is said to lie on p. A perturbation of a set 
of unit vectors {vi} is set of trajectories {c,}, such that Ci starts at Vi for each 
i. Remark: In the following, vectors in a configuration are often required to be 
of unit length to avoid technicalities and to allow application of perturbations 
in the manner defined. 

Lemma 1. Suppose that vi and V 2 are unit vectors lying on planes p\ and p 2 , 
resp., such that the angle between them is 9 > Q. Then the following hold: 
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(a) (b) (c) (d) (e) (f) 



Fig. 1. (a) and (c): Configurations on the planes containing faces of a regular polyhe- 
dron; (b) and (d): Graphs of (a) and (c), respectively - the subgraph of maximal edges 
is bold; (e) A graph of some configuration, weights not shown, where the subgraph 
(bold) of maximal edges is acyclic; (f) An extreme configuration on the 3 co-ordinate 
planes. 



1. There are trajectories Ci : [0,ej] — >■ S^, starting at Vi and lying on pi, for 
1 < i < 2, such that the angle between c\{f) and C 2 {f) is less than 9 for 
0 < t < min 6i . 

2. If the projection of V 2 on p\ is null or parallel to vi, then for any trajectory 
Cl : [0,ei] — >■ S^, starting at Vi and lying on pi, the angle between Ci{t) and 
V 2 is always at least 9. 

Roughly speaking, non-parallel non-null vectors vi and V 2 may always be ro- 
tated simultaneously on their respective planes to bring them (angularly) closer 
together, but there exist configurations where they may not be brought closer 
together by rotating only one of them. 

Proof. It is convenient to picture vi and V 2 originating from the same point O 
on a line common to pi and P 2 . The proof then follows elementary geometric 
observations. We omit details here. □ 

Definitions. Let C = {v\,V 2 , . . . , u„} be a configuration on a finite set of planes 
F. Construct a complete undirected graph Gc on n vertices named “gi”, “52”, 

. . ., with the weight on the edge joining vertices gi and gj being the angle 

between the vectors Vi and Vj . Call Gc the graph of G. Call edges of maximal 
weight in Gc maximal edges. Denote by G'q the subgraph of Gc consisting ex- 
actly of the maximal edges and their endvertices, call it the subgraph of maximal 
edges. See Figures 1 (b) and (d). Remark: We shall often refer to vertices of Gc 
as if labeled by vectors of C - the reason we do not use the vectors themselves 
to name vertices is that there may be duplicates in a configuration. 

If G is an extreme configuration on F , obviously the weight of a maximal 
edge of Gc is equal to minmax{F), and the weights of other edges strictly less. 

Lemma 2. If F is a finite set of at least 3 planes and G is an extreme con- 
figuration on F, then the subgraph of maximal edges G'q contains at least one 
(simple) cycle; in other words, Gc contains a cycle of maximal edges. 
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Proof. If minmax{F) = 0, the lemma hold trivially as all edges of Gc, for an 
extreme configuration C, have weight 0. 

Assume then that minmax{F) > 0. We claim that, if is acyclic, then G 
cannot be extreme. In fact, we shall prove the following: 

Sublemma. Let F = {pi, . . . ,p„} and minmax{F) > 0. Let G = {vi , . . . , w„}, 
where each vector Vi is of unit length and lies on pj, 1 < i < n, be a configuration 
on F such that G'q is acyclic. Suppose, by renaming, if necessary, that the set 
of vertices of G^ is {g\, . . . ,gm}, m < n. 

Then there are trajectories : [0,6^] — >■ S^, starting at Vi and lying on pi, 
for 1 < i < TO, such the max{G{t)) < max{G), for 0 < t < mine^, where the 
configuration 

Git^ — {oi(t), ■ . ■ , Cm(t), Um-i-i, . . . , 

In the following and elsewhere “rotation” is often used informally - the use 
may be formalized in terms of trajectories. 

Proof. The proof is by induction on the number to of vertices in G'q, which, of 
course, is at least 2. If to = 2, use item I of Lemma I to begin the induction - 
observing that sufficiently small rotations of v\ and V 2 will not change the weight 
of any initially non-maximal edge enough for that edge to become maximal. See 
for example Figures 1(c) and (d), where vi and V 2 may be rotated simultaneously 
on their respective planes to reduce the angle, call it a (originally 27 t/ 3), between 
them, but without letting a drop below any of the other five angle weights. 

Assume, inductively, that the claim is true if the number of vertices in G'q 
is TO < M, for a given M > 2. Suppose now that G is a configuration on 
F so that G'q = {gi, . . . ,gM+i} has M +1 vertices. Consider a leaf vertex, 
w.l.o.g., say gM+i of G'q. See Figure 1(e). First, apply the inductive hypothesis 
to the configuration G — {vm+i} on the set of planes F — {pM+i} to obtain 
vector trajectories Ci : [0,£i] — >■ S^, for 1 < i < M, satisfying the claim of the 
sublemma. 

Now, applying item 1 of Lemma 1 to the pair vm and vm+i, and possibly 
truncating and reparametrizing the domains of the trajectories Ci, one can con- 
struct trajectories c' : [0, e'] — >■ S^, for 1 < t < M -|- 1, satisfying the claim of the 
sublemma - observing again that a sufficiently small rotation of vm+i will not 
change the weight of any initially non-maximal edge adjacent to vm+i enough 
for that edge to become maximal. 

It is in the last inductive step above that we make crucial use of the regularity 
of trajectories . If, for example, the application of the inductive hypothesis were 
allowed to produce trajectories c^, 1 < i < M, where cm is stationary, then 
it might not be possible to construct a trajectory satisfying the claim 

of the sublemma (see item 2 of Lemma 1). However, as cm is not stationary, 
the projection of CM{t) on pm+i will be neither null nor parallel to vm+i, for 
0 < t < S and sufficiently small 6, allowing application of item 1 of Lemma 1 to 

the pair um, wm-i- 1 - □□ 
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Lemma 3. Let F be a finite set of planes and C a configuration of unit vectors 
on F such that 

, . . . , = (/ii is a simple cycle of length four or more in G'q. Then, of 

the m{> 4) vectors at most three are distinct; in other words, if Gc 

contains a simple cycle of maximal edges, at most three of the vectors (precisely, 
vector labels) on that cycle are distinct. 

Proof. Suppose first that G'q contains a simple 4-cycle gi ^ , , gi^ , gi^ such 

that, if possible, all four vectors Vi^,Vi.^,Vi,^,Vi,^ are distinct. Parallel-translate 
the vectors Wi^., 1 < fc < 4, so that their startpoints are all located at one point 
O. As the angles between successive pairs are equal, the endpoints of Vi^, call 
them pi^, , 1 < A: < 4, form the vertices of a spherical rhombus on a unit sphere 
centered at O, i.e., the great circle arcs joining PiiPi 2 ,Pi 2 Pi 3 iPi 3 Pi 4 jPiiPii are all 
of equal length. See Figure 2(a). It may be verified that, in a spherical rhombus, 
at least one of the two great circle diagonals joining opposite vertices is of greater 
length than the arc length of a side. This leads to the conclusion that one of the 
angles between either and or Vi 2 and Vi,^ is greater than the angle between 
a successive pair of , Vi .^ , Wjg , , contradicting that g ^^ , gi ^ , gi ^ , g ^ , is a 

cycle in G'q (with all edge weights maximal in Gc). It follows that the initial 
hypothesis that all four of Vi ^ , , -Uig , are distinct cannot hold. Cases of cycles 

of length greater than 4 may be argued similarly. 

See Figure 1(b) for an example of a 4-cycle 5i, 52 , 53 , 34 , 5i of maximal edges 
where only three of the vectors are distinct. □ 




Fig. 2. (a) A spherical rhombus at the endpoints of a cycle of maximal edges; (b) 
Standard axes (bold) and a regular equicycle {AA' , BB' ,CC'} on F; (c) An extreme 
configuration on F; (d) and (e): Graphs of configurations for Lemma 5 - subgraph of 
maximal edges is bold 



3 Regular Polyhedra 

If F is the set of planes containing the faces of a polyhedron P, then a con- 
figuration on F is simply called a configuration on P. Accordingly, define 
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minmax{P) = minmax{F). Henceforth, if / denotes a face of a polyhedron, 
it shall also denote the plane containing that face - ambiguity will not be a 
problem. 

We wish to determine extreme configurations {modulo rigid motions of K^) 
and the minmax angle for regular polyhedra. Clearly, these depend only on the 
type of the given regular polyhedron, i.e., its number of faces, and not its size. 

The cube is the simplest. Observe first that the case of the cube reduces 
to finding extreme configurations on the set of three co-ordinate planes: as the 
(axis-parallel) cube has two faces parallel to each co-ordinate plane, one need 
only duplicate each vector from an extreme configuration on the co-ordinate 
planes. 

Extreme configurations on the three co-ordinate planes may be found by 
elementary means (as in [3]): they are precisely non-null vectors, one on each 
plane, at 45° to the axes, so that all three vectors may be parallel-translated to 
lie on the walls of a single octant with each startpoint at the origin. The angle 
between any two is then 60°. See Figure 1(f). Omitting further detail: 
Theorem 1. The minmax angle of a cube is 60°. □ 

Next, consider a regular tetrahedron. In this case, a “symmetric” disposition 
of an extreme configuration, as on a cube, is elusive.^ In fact, results are far less 
trivial and somewhat curious. 

We begin with a few definitions: 

Definitions. Fix a regular tetrahedron T and choose a set F = {/i, / 2 , /s} of 3 
faces of T meeting at a vertex O, so that the order /i, / 2 , /s is counter-clockwise 
around O, viewed from above O - imagine the face opposite O to be the base of 
T. Label the remaining face of T, the one opposite O, as / 4 . 

Choose a direction, call it the standard axis, on each of /i,/ 2,/3 in the 
following symmetric manner: the standard axis on /j is along the edge opposite 
O in fi, oriented by a counter-clockwise, viewed from above, traversal of the 
perimeter of f^. Label the vertices of as A, B, C, where the edge AB {BC, CA) 
lies on face /i (/ 2 , /s) and is oriented along the standard axis. See Figure 2(b). 

Now that a standard axis has been chosen on each face of F, given any 
direction, i.e., a non-null vector, w on a face fi € F, define the angle of v (on 
face fi) to he 6, 0 < 6 < 2n, the angle measured counter-clockwise, viewed from 
above, from the standard axis on fi to v. 

An equicycle of vectors on F is a configuration C = {ui, V 2 , U3} on F such 
that the angle between each pair of vectors from C is equal to the same value (j>, 
which is called the angle of the equicycle. 

A regular equicycle of vectors on F is an equicycle so that the angle of each 
vector on its corresponding face is the same. See Figure 2(b). Remark: It may 
appear somewhat unintuitive that there exist irregular equicycles on F, which 
is probably why the following crucial lemma has a surprisingly non-trivial proof 
- keep the face labeling scheme above in mind: 

^ It is amusing (and frustrating) to try, as we did, to discover an extreme configuration 
on a cardboard model of a regular tetrahedron with rotating arrows pinned to each 
face! 




Joint Separation of Geometric Clusters 235 



Lemma 4 . Let ^1,212,^3 be the mid-points of the sides opposite O in the faces 
/i;/2j/3> resp. Then extreme configurations of F = {/i,/2,/3} are precisely 
configurations of the form 

ai0Ai,a20A2,as0A3 where ai,a2,Q;3 are arbitrary non-zero scalars, either all 
positive or all negative. See Figure 2(c). 

Further, any configuration of vectors {v\,V2,v^} on F that is not extreme can 
be perturbed an arbitrarily small amount to a configuration with smaller maximal 
angle. In other words, the function max{C) on the space of configurations C on 
F has no local minimum that is not a global minimum. 

Remark: The first claim of Lemma 4 is geometrically intuitive: an extreme 
configuration of F is situated symmetrically on the faces /i, 1 < f < 3, looking 
out from O. 

We defer the somewhat lengthy proof of Lemma 4 till after our main theorem. 
But first we need another preliminary lemma. 

Lemma 5 . If C is an extreme configuration of unit vectors on a regular tetra- 
hedron T, then G'q contains a simple 4-cycle; in other words, Gc contains a 
simple 4-cycle of maximal edges. 

Further, relabeling faces of T if necessary, G is of the form {ui, W2, ^3, U4}, 
Vi lying on face fi, 1 < z < 4 , ofT, where (a) Vi = V4 lies on the edge shared by 
fi and fi, and (b) vi,V2,V3 label vertices of a cycle of maximal edges in Gc. 

Proof. For the first statement of the lemma, let G = {vi,V2,V3,Vi}, with Vi 
lying on face /i of T, 1 < z < 4 . By Lemma 2 , as G is an extreme config- 
uration, Gc contains at least a simple 3 -cycle of maximal edges. Suppose, if 
possible, that it does not contain a simple 4 -cycle of maximal edges. Say then, 
w.l.o.g., that gi,g2,ga are the vertices in a simple 3 -cycle of maximal edges of 
Gc (corresponding - recall the definition of Gc ~ to vectors v\, V2, V3, resp.). 

Denote by 9 the minmax angle of a regular tetrahedron, i.e., the weight of 
a maximal edge in Gc. By rotating V4 on fi, if necessary, one can assume that 
it makes an angle of 9 with at least one of vi,V2,vs - w.l.o.g., assume that the 
angle between Vi and v\ is 9. Now, both the angle between Vi and V2 and the 
angle between Vi and V3 must be less than 9, for, if not, we would have a simple 
4 -cycle of maximal edges. Therefore, Figure 2 (d) is a corresponding depiction of 
Gc. 

Consider the set of planes H = {/2, /s, A}- By Lemma 1 , there is a pertur- 
bation of V2 and V3 on the faces /2 and /s, resp., small enough that it reduces 
the angle between V2 and V3, but does not increase either the angle between Vi 
and V2 or the angle between Vi and V3 to the value 9. One, therefore, reaches 
a configuration on H whose maximal angle is less than 9. This implies that 
minmax{H) < 9. 

Next, consider the set of planes F = {/i, /2, /s}- There is a rigid motion of 
that takes F to H, i.e., maps the 3 planes of F one-one to those of H. It follows 
that minmax{F) < 9 as well. Therefore, the configuration D = {vi,V2,vs} on 
F is not extreme. 
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As the configuration D on F is not extreme, it can be perturbed, using the 
second claim of Lemma 4, an arbitrarily small amount to a configuration D' , 
such that max{D') < 9. Since the angles between and V2 and U4 and U3 are 
both less than 9, one can find a perturbation (mimicking exactly the argument in 
the last inductive step of Lemma 2) of the configuration C = {ui, U 2 , V 3 , U 4 } to a 
configuration C", so that max{C) < 9: roughly, as vi, V2 and V3 rotate towards 
the configuration D' , also rotate V4 towards vi reducing the angle between them, 
at the same time using the “slack” in the angles between V4 and V2 and V4 and 
Vs to not let these angles grow to 9. This, of course, contradicts that C is an 
extreme configuration. 

We conclude that Gc indeed contains a simple 4-cycle of maximal edges. 
This proves the first statement of the lemma. 

For the second statement, observe first that, by Lemma 3, at most three 
of the Vi are distinct. Clearly, not all the Vi are identical, given that planes of 
a tetrahedron do not intersect in one straight line. If only two of the Vi are 
distinct, say, w.l.o.g., Vi and V 2 , then each must label a pair of vertices of Gc 
and, therefore, each must lie on the edge shared by the corresponding pair of 
faces of T. We conclude that, in such a case, vi and V2 would lie one each on two 
skew (non-intersecting) edges of T. However, it is easily verified that no such 
configuration is extreme. We conclude that exactly three of the vectors Vi are 
distinct. 

Relabeling faces of T if necessary, assume that vi = V 4 and that {v\,V 2 ,vs} 
is a set of distinct vectors. Since v\ and V4 lie on the faces fi and / 4 , resp., 
vi = V 4 lies on the edge shared by fi and / 4 . Therefore, in this case, a simple 
4-cycle in G'^ must be gi, g2, 94, 9si 9i- Accordingly, the graph Gc is as depicted 
in Figure 2(e), where one must show that the angle between V2 and vs is indeed 9 
and not less. However, if the angle between V 2 and vs were less than 9, then V 2 and 
Vs could be rotated simultaneously towards wi(= V4) to reach a configuration 
whose maximal angle is less than 9, contradicting the hypothesis on 9. This 
proves the second statement. □ 

With Lemma 5 in hand we present our main theorem, whose proof now 
reduces to solving a “finite and local” optimization problem. 

Theorem 2. The minmax angle of a regular tetrahedron is cos~^ ~ 

67.0206°. 

Proof. By Lemma 5, an extreme configuration G on a, regular tetrahedron is 
of the form {ui, U2, U 3 , U 4 }, where (a) Vi lies on a face /i, 1 < i < 4, of T, (b) 
vi = V 4 lie on the edge shared by /i and /4 (and may be assumed oriented along 
the standard axis of /i), and (c) V\,V2, vs are the vertices of a cycle of maximal 
edges in Gc- Consider the regular tetrahedron of side length 2 as located in 
Figure 3(a): fi,f2,fs are the faces OAB,OBG,OGA, resp. We, therefore, need 
now determine all equicycles {vi,V 2 , U 3 } of vectors on F = {/i, / 2 , fs}, such that 
vi is directed along the standard axis on fc so, if vi,V 2 ,vs are of angles 9, a, if 
on /i, / 2 , fs, resp., then 0 = 0. Assuming all vi, I < i < A, to be of unit length. 
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Fig. 3. (a) Regular tetrahedron of side length 2 - standard axes on /i,/ 2,/3 bold; 
(b) An extreme configuration on a regular tetrahedron with the angle in degrees of 
vectors on faces /i , /2 and /a (see definitions preceding Lemma 4) indicated; (c) Cone 
k with semi-vertical angle (j> for the proof of Lemma 4; (d) Hypothetical graphs of 

‘^left, left, left “bright, left, right- 



we have from trigonometry on Figure 3(a): 
vi = i 



( 1 ) 



/I 1 . , ,73 1 , , 272 . , 

V 2 = (—-COS (7 ^ sm cr) 2 +(-— COS CT sm a) j H ^sincr/c (2) 

2 2v3 2 6 3 

1 1 \^3 1 

V 3 = (— -cos-i/j-l ^sintp)i+{ — — cos tp — - sin tp) j -\ — sin ip k (3) 

2 2-^/3 2 6 3 



As ZwiV 2 = ^ViV 3 one has Vi ■ V 2 = Vi ■ v^, so that 

(--cos cr- ^sina) = (-- cos V' + ^ sini/') 



(4) 



which solves to 

pJ = O' — — or Ip = —a. 
o 

If Ip = a — ^ then one has from equation (3) that 

,1 1 , 1 5 . , , 

^3 = (--COSCT- ^smcr)z-h(-^coscr- - sm cr) j 

^ ft. 

+ ( — — sm a ^ cos cr) k 

3 73 



(5) 



As Zt' 2 f 3 = ^viV 2 one has V 2 ■ vs = vi ■ V 2 , and so from equations (1), (2) and 
(5) follows 



2 2 2 1 1 

- sm a = sm a cos o = -- cos a = sm a 

3 73 2 273 



(6) 
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Writing X = vi ■ V 2 = — ^coscr — ^^sincr, one has by manipulating equa- 
tion (6) that -I- X — 1 = 0, which solves to X = i.e., X = 0.3904 

or X = —0.6404. The corresponding angles of the equicycles are cos“^ X « 
67.0206° or 129.8217°. 

If ^ = —a then, calculating similarly, X = vi ■ V 2 satisfies the quadratic 
2 X^ — X — 1 = 0, which solves toX=lorX = — |.X = — | corresponds to a 
regular equicycle of angle 120°, where each of vi, V 2 , V 3 lies on and is oriented 
either along or opposite the standard axis of its face; X = 1 does not give a 
feasible solution. 

The theorem follows. See Figure 3(b) for an extreme configuration of a regular 
tetrahedron depicted on an unfolding. □ 

Proof of Lemma 4- To determine extreme configurations on F, by Lemma 2, 
one need consider only equicycles. Therefore, let us examine how equicycles are 
formed on F. Consider a unit vector v on fi of angle 9 (see again definitions 
preceding the statement of the lemma and Figure 2(b) for the labeling scheme 
corresponding to F). Locate the startpoint of v at B. Let the projection of v on 
/2 be the vector v' of angle 7 = f{9) on /2 - the function / is well-defined as the 
projection on to /2 of any non-null vector on fi is again non-null. Let the angle 
between v and v' he a = h{9): h is well-defined such that always 0 < h{9) < tt/2. 

Consider now a (possibly degenerate) cone k with semi-vertical angle 4>, 0 < 
4> < TT, apex at B, and axis along u (if ()) > tt/ 2, imagine a cone with semi- vertical 
angle t: — (f and axis along —v) . There are three possibilities: 

!.(/)< h{9) or (j) > TT — h{9), when k does not intersect /2 except at B: there is 
no vector on /2 making an angle (j) with v. 

2. 4> = h{9) or (j) = Tr — h{ 6 ), when k is tangential to f 2 - there is exactly one unit 
vector (the length of the vector is immaterial but we fix it to avoid clumsy 
statements) on /2 making an angle f with v. 

3. h{9) < (j) < TT — h{9), when k intersects /2 in two distinct straight lines: there 
are exactly two unit vectors on /2 making an angle (f> with v. 

It is seen by symmetry, in cases 2 and 3, that the two (possibly equal) unit 
vectors w\ and W 2 on /2, making an angle of </> with v, are at angles f{9) ± f3 
on /2 (see Figure 3(c)), where again (3 = g{9,4>) is a well-defined function with 
range 0 < g{9,(j)) < tt in the domain where h{9) < 4> < tt — h{6). Observe 
that g{9,(j>) = 0 if ^ = h{9), and g{6,(j)) = tt if </> = tt — h{9). Distinguish, if 
required, between w\ and W 2 by declaring that Wi is the right (left) vector on 
/2 making an angle of (p with v, if Wi lies in the half-space in /2 of the straight 
line containing v' that is to the right (left) of the observer who is standing on 
/2 to the outside of T and facing in the direction of v' . Given our standard axes 
and angle measuring conventions, the right (resp., left) vector on /2 making an 
angle of f with v is of angle f{9) — g{9, <p) (resp., f{9) + g{9, </>)) on /2. 

Next, we describe a procedure to derive all equicycles of unit vectors con- 
taining the particular vector v of given angle 9 on f\. Fix a governing tuple 
{Xi, X 2 , X 3 ) where each Xi, 1 < f < 3, has the value “left” or “right”. Solve - 
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though we shall not explicitly do so - the following 3 equations in 3 unknowns 
to find (j>, 9' and 9": 

9' = m±g{9,^) (7) 

9” = f{9')±g{9'A) (8) 

9 = f{9”)±g{9”,cj,), (9) 



where the sign on the RHS of equation (z + 6), 1 < i < 3, is — or + according as 
Xi is “right” or “left”, respectively. Intuitively, start with v on fi, choose one of 
the two vectors on /2 at angle of </> to z;, choose again one of two vectors on /a 
at angle (f) to the one chosen on / 2 , finally choose a vector on /i at angle (j) to 
the one chosen on /a, and this last vector must be v again (by the symmetry of 
the tetrahedron the same functions / and g apply at each step). 

The at most 8 equicycles of unit vectors containing the vector v are obtained 
by solving the 8 sets of equations as above, corresponding to the 8 choices of the 
governing tuple {Xi, X 2 , X 3 ), and, accordingly, determining the angle {(j)) of the 
equicycle and the angles {9' and 9") of the other two vectors in the equicycle on 
their respective faces. In other word, 4>, 9' and 9" are determined as functions of 
9, for each choice of the governing tuple {Xi,X 2 ,X 3 ). Denote by 4 >Xi,X 2 ,X 3 the 
function (f> of 9 for the choice {Xi,X 2 ,X^). 

Given our standard axes and angle measuring conventions it may be seen 
that a (left, left, left) choice for the governing tuple gives exactly all regular 
equicycles. 

Clearly, we need to determine, for each choice of the governing tuple 
{Xi, X 2 , X^), the minima of (f>Xi,X 2 ,X 3 as a function of 9. Attempting this di- 
rectly by computing an explicit expression of 4 >Xi,X 2 ,X 3 in terms of 9 and then 
differentiating to use calculus seems forbiddingly complicated. Instead, we apply 
a 3-step plan based on determining certain properties of the functions involved: 

(a) Observing that a (left, left, left) choice of the governing tuple gives regular 
equicycles, find the maxima and minima in this case by elementary means. 

(b) Analyze the simultaneous equations in 9, 9', 9" and ((), given = 0, 

to conclude that the number of values of 9 at which 4 >Xi,X 2 ,X 3 is extreme is 
equal, for all choices of the governing tuple (Ai, A 2 ,Xs). 

(c) Use the above two to draw conclusions for the cases of the irregular equicy- 
cles. 

(a) Locate a regular tetrahedron of side length 2 as in Figure 3(a), where 
faces /i, / 2 , /s are OAB, OBC, OCA, resp. Consider a regular equicycle of unit 
length vectors vi,V 2 ,V 3 lying on /i,/ 2 ,/ 3 , resp., of angle 9 in their respective 
faces (so that 9 = a = if in Figure 3(a)). Then, by trigonometry: 



1 2^2 

vi = cos 9i-\ — sin 0 7 -I sin 9 k 

3 3 

1 1 

V2 = (--cos6»- ^sin6»)z-k (— cos( 



( 10 ) 



- sin 9) j 
6 



2V2 



sin 9 k 



3 



( 11 ) 
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(therefore) vi ■ V 2 




(12) 



Differentiating equation (12) w.r.t. 0, one sees that = cos 

has two (global) maxima at 0 = 0,7r, and two (global) minima at 9 = f , 

One concludes that (i) on the space of regular equicycles C, max{C) is a 
global minimum exactly at the configurations described in the first part of the 
lemma, and (ii) a regular equicycle C , where max(C) is not a global minimum, is 
not a local minimum either, i.e., it can be perturbed an arbitrarily small amount 
to a regular equicycle with smaller maximal angle. 

(b) From equations (7) and (8) above, for a given choice of the governing tuple 
(Xi, Jf2,X3), one has the following two equations, resp., (the ± sign depending 
on Xi and X2): 



dO' = %-de' ±^d0± 



dg 



d9 



89 






d(j)Xi,X2,X3 

dg 



d4>Xi,X2,X3 

d(l> Xi,X2,X3 



d4>Xi,X2,X3 

From equation (9) one has that (the ± sign depending on X3) 



(13) 

(14) 



0 = d{f{9")±g{9'\<l>x3,X2,X3)-0) 
d9"^^ 89"^^ 8^X3,X2,X3 



d4>Xi,X2,X3 — d9 



= . . . using equations (14) and (13) to substitute d9" and then d9', resp. 
f df , 8g df 8g df 8g 



^^d9" 89" ’^d9' 89’’ 



^d9" 89" 



1 )) 



8g 



ddxi, 



X 2 -iXs 



-d(j) Xi,X2,X3 — d9 



(15) 



Dividing equation (15) throughout by d9 and setting 



_ Q obtains 



( 



d[_ 

d9" 



± 



dg w df 
89" ’^d9' 



± 



89' ’^d9 



± 



8^ 

89 



) = 1 



(16) 



Equations (7), (8), (9) and (16) give 4 simultaneous equations in 4 unknowns 
that solve, in particular, to give values of 9 when 4>Xi,X2,X3 is an extreme value: 
observing that, for different choices of the governing tuple (Xi, X2, X3), the sum- 
mands in the 4 equations differ only in signs of their coefficients, one concludes 
that the number of values of 9 at which 4>Xi,X2,X3 is extreme is same for all 
choices of (Xi,X 2 ,X 3 ). 

(c) Since, by (a), there are 4 values of 9 at which is extreme, 

it follows, from (b), that (j)Xi,X 2 ,X 3 is extreme for 4 values of 9, as well, for each 
(Xi,X2,X3). 
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Now, for each (Xi,X2, ^3), it is seen that there are at least two maximum and 
two minimum values (both of which are global and so local as well) of <pXi,X2,Xi- 
In particular, by compactness and the non-constantness of 4>Xi,X2,X3i there is at 
least one global maximum and one global minimum; reversing the vectors in the 
configurations corresponding to these two extrema gives another maximum and 
minimum, respectively. As the total of values of 9 at which 4>Xi,X2,X3 is extreme 
is known to be 4, one concludes that there are exactly 2 values of 9 where 
4>Xi,X2,X3 is a maximum and 2 where it is a minimum, for any (Xi, X2, X^). 

Next, we claim that each of these maximum and minimum values, for any 
governing tuple {Xi, X2, X3), corresponds, also, to a regular equicycle. For, if 
not, one would have a minimum value of some 4>Xi,X2,X3 from an irregular 
equicycle of vectors vi,V2, U3 of angles 9\,92, 9 ^ (so not all equal by irregularity) 
on the faces /i,/2,/3, resp. By the symmetry of a regular tetrahedron, one 
could then derive at least one more value of 9 , other than 9 \, at which 4>Xi,X2,X3 
is a minimum, by cyclically permuting the vectors V\,V2,V3 to configurations 
consisting of vectors of angles (02, 03, 0i) and (03, 0i, 02) on (/i, /2, fz), resp. This 
would contradict the bound on the number of values of 0 at which (j)Xi,X2,X3 is 
minimum. 

To complete the proof of the second part of the lemma, observe that, if C is 
a configuration at which max(C) is a local minimum in the space of configura- 
tions, then C must be an equicycle arising from some choice of a governing tuple 
{Xi, X2, X3), where 4>Xi,X2,X3 is a local minimum value for the correspond- 
ing value of 0 (the angle of the vector in C on face /i). From the arguments 
above, one sees that C belongs to the space of regular equicycles, where we 
know that configurations where the maximal angle is minimum are precisely 
those described in the first part of the lemma; further, we also know that any 
other regular equicycle can be perturbed, in the space of regular equicycles, to 
reduce its maximal angle. Intuitively, the minima (and maxima) of 4>Xi,X2,X3, 
for any (Xi, X2, X^), occur where the graph of 4>Xi,X2,X3 intersects the graph 
^left, left, left (s®® Figure 3(d)). □ 

Tabulated next are minmax angles of regular polyhedra that we have calcu- 
lated to date (see notes following). 



Regular polyhedron (faces) 


Minmax angle in approx, degrees 


Tetrahedron (4) 


67.02 


Cube (6) 


60.00 


Octahedron (8) 


67.02 


Dodecahedron (12) 


7 


Icosahedron (20) 


7 



1. The faces of an octahedron can be split into four parallel pairs. We, therefore, 
need only consider four mutually non-parallel faces; precisely, choose four 
faces adjacent to any one vertex. Results analogous to those for a tetrahedron 
can be proved. Surprisingly, the same quadratics arise for the equicycles 
on three (of the chosen four) faces of the octahedron as arise in the proof 
of Theorem 2 for a tetrahedron: we do not have a clear understanding at 
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present, but suspect this phenomenon is due to the symmetric situation of 
two of the three faces w.r.t. the fixed vector on the third. Consequently the 
minmax angles of the tetrahedron and octahedron are identical. 

2. To date our calculations for the dodecahedron and icosahedron are incom- 
plete. 

4 Generalizing the Measure, Conclusions, and Open 
Problems 

The measure minmax{F) of the joint separation of a finite set of planes F 
generalizes naturally. Let S' be a cluster of objects si, S 2 j • ■ • , Sn from some space 
S, such that each object Si is associated with a non-null subset Xi of some metric 
space M with distance measure d. A spanning configuration (7 on S is a sequence 
xi,X 2 , ■ ■ ■ ,x„ of elements of M such that Xi & Xi, 1 < i < n. The diameter of 
C, denote it max{C) = sup{(i(a;i, a;^) | 1 < z, j < n}. Define the joint separation 
of S by minmax{S) = inf{maa;(C') | C is a spanning configuration on S}. 



□ 



P- 



O 

O 



-B 



□ 



o 




(a) (b) 

Fig. 4. (a) Points of 3 colors on a plane with a set of 3 points of all different colors 
contained in the smallest square (equivalently, having the smallest Loo diameter) shown; 
(b) 3 regions A, B and C with a set of points chosen from each such that the L 2 diameter 
is minimum. 

We describe a couple of practical scenarios to apply joint separation. 

Consider n points in real m-space K*”, each having one of fixed k colors. 
In other words, we have a cluster of k finite subsets of K™. Problem: given an 
Lp metric on M™, find a subset of K™ that contains one point of each color 
(i.e., a spanning configuration), of minimum diameter. See Figure 4(a). If one 
thinks of k populations drawn from an m-attribute database, we are measuring 
a joint separation of the different populations. The L 2 and Loo metrics seem 
most natural in the databases context. 

Aggarwal et al [1] give efficient algorithms for the following related problems 
by applying higher-order Voronoi diagrams: given a set S' of n points on a plane 
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and a fixed k, find a subset of S of size k with minimum {L 2 , Lao) diameter. It 
is not clear at present if their techniques generalize to the multi-colored setting. 

Consider k regions on the plane with L 2 metric. Problem: find a set of points, 
one from each region, of minimum diameter. See Figure 4(b). Clearly, the mea- 
sure of joint separation of the regions is relevant to, say, transmission facility 
location. We are not aware of efficient algorithms related to this problem. 

Another simple but more theoretical example, motivated by the definition 
for planes: let S' be a set of oriented polygonal arcs Si, S 2 , . . . , s„, associated 
each, resp., with the set Xi of oriented segments, normalized to unit length 
vectors, comprising it. Use the metric of angles between such vectors to define 
minmax(S) (similar to what we did for planes). Question: if the arcs in S are 
all, say, known to be convex, is there an efficient way to compute minmax{S)7 

The general notion of joint separation seems natural enough that one expects 
applications in various domains. 

The complexity of computing joint separation is always of interest. Even 
in the case of planes in that we have discussed, the question of how to 
efficiently determine the joint separation (i.e., minmax angle) of arbitrary finite 
sets remains open, though the general results from Section 2 could be useful. 
For regular polyhedra we believe completing calculations of the joint separation 
of the faces of a dodecahedron and an icosahedron will be tedious but not hard. 
Most welcome would be insights that lead to short and elegant calculations. 

It would be useful, as well, to prove general theoretical results about joint 
separation (independent of the setting) - a trivial example being minmax{S') < 
minmax{S) if S' C S. Interesting results would likely require at least some 
assumptions on the re^rtZar subsets of M, i.e., subsets associated with some object 
of S. Comparisons should also be made with known measures of separations of 
clusters [2,4]. 
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Abstract. The Covering Steiner problem is a common generalization 
of the fc-MST and Group Steiner problems. An instance of the Covering 
Steiner problem consists of an undirected graph with edge-costs, and 
some subsets of vertices called groups, with each group being equipped 
with a non-negative integer value (called its reguirement)-, the problem 
Is to find a minimum-cost tree which spans at least the required number 
of vertices from every group. When all requirements are equal to 1, this 
is the Group Steiner problem. 

While many covering problems (e.g., the covering integer programs such 
as set cover) become easier to approximate as the requirements increase, 
the Covering Steiner problem remains at least as hard to approximate 
as the Group Steiner problem; in fact, the best guarantees previously 
known for the Covering Steiner problem were worse than those for Group 
Steiner as the requirements became large. In this work, we present an im- 
proved approximation algorithm whose guarantee equals the best known 
guarantee for the Group Steiner problem. 



1 Introduction 

We present an improved approximation algorithm for the Covering Steiner prob- 
lem. This is a covering problem has the following property that goes against the 
norm for covering problems: its approximability cannot get better as the cov- 
ering requirements increase. Thus the approximability of the general Covering 
Steiner problem is at least as high as for the case of all unit requirements, which 
is just the Group Steiner problem. In this work, we improve on the current-best 
approximation algorithms for the Covering Steiner problem given by Even et 
al. [3] and Konjevod et al. [9]. Our results match the approximation guarantee 
of the current-best randomized algorithm for the Group Steiner problem due to 
Garg et al. [5] (see the paper of Charikar et al. [2] for a deterministic algorithm). 
A suitable melding of a randomized rounding approach with a deterministic 
“threshold rounding” method leads to our result. 

* This material is based upon work supported in part by the National Science Founda- 
tion under Grant No. 0208005 to the second author. Part of this work was done while 
the authors were at Lucent Bell Laboratories, 600-700 Mountain Avenue, Murray 
Hill, NJ 07974-0636, USA. 
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Let G = {V, E) he an undirected graph with a non-negative cost function 
c defined on its edges. Let a family Q = {g\,g 2 , ■ ■ -,5fc} of k subsets of V be 
given; we refer to these sets gi,g 2 ,. . ■ ,gk as groups. For each group gi, a non- 
negative integer Vi < \gi\ is also given, called the requirement of the group. The 
Covering Steiner problem on G is to find a minimum-cost tree in G that contains 
at least r* vertices from each group gp, the special case of unit requirements (i.e., 
ri = 1 for all i) corresponds to the Group Steiner tree problem. We denote the 
number of vertices in G by n, the size of the largest group by N, and the largest 
requirement of a group by K. Logarithms in this paper will be to the base two 
unless specified otherwise. 

As in the paper of Garg et al. [5], we focus on the case where the given 
graph G = {V, E) is a tree, since the notion of probabilistic tree embeddings [1] 
can be used to reduce an arbitrary instance of the problem to a instance on a 
tree. Specifically, via the result of Fakcharoenphol et al. [4], a /o-approximation 
algorithm on tree-instances implies an 0(plogn)-approximation algorithm for 
arbitrary instances. In fact, we can assume that the instance is a rooted tree 
instance where the root vertex must be included in the tree that we output; this 
assumption can be discharged by running the algorithm over all choices of the 
root and picking the best tree. 

For the special case of the Group Steiner tree problem where K = 1, the 
current-best approximation bound for tree instances is 0((log/c) • (log A)). For 
the Govering Steiner problem, the current-best approximation algorithm for tree 
instances is 0((log fc -I- log AT) • (log A)) [3,9]; also, an approximation bound of 

(logA) -log^fc 
log(2(log A) • (log fc)/ log K) 

is also presented in [9], which is better if A > where a > 0 is a constant. 

(As mentioned above, we need to multiply each of these three approximation 
bounds by O(logn) to obtain the corresponding results for general graphs.) 

Note that these current-best approximations get worse as the requirements 
increase, i.e., as K increases. This is unusual for covering problems, where the 
approximability gets better as the coverage requirements increase. This is well- 
known, for instance, in the case of covering integer programs which include the 
set cover problem; the “multiple coverage” version of set cover is one where each 
element of the ground set needs to be covered by at least a given number of 
sets. In particular, the approximation ratio improves from logarithmic to 0(1) 
(or even 1 -|- o(l)) for families of such problems where the minimum covering 
requirement B grows as i7(logm), when m is the number of constraints (see, 
e.g., [10]). In light of these results, it is natural to ask whether the approximation 
guarantee for the Govering Steiner problem can be better than that for the Group 
Steiner problem. 

This question can easily be answered in the negative; indeed, given a rooted 
tree instance of the Group Steiner problem and an integer K, we can create an 
instance of the Govering Steiner problem as follows: increase the requirement of 
every group from 1 to K, connect K —1 dummy leaves to the root with edge-cost 
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zero and add these leaves to all the groups. It is easily seen that any solution to 
this Covering Steiner instance can be transformed to a solution to the original 
Group Steiner instance with no larger cost, and that the two instances have the 
same optimal solution value. Therefore, the Covering Steiner problem is at least 
as hard to approximate as the Group Steiner problem. (This fact was pointed 
out to us by Robert Krauthgamer.) 

We are thus led to the question: can the Covering Steiner problem be approx- 
imated as well as the Group Steiner problem? The following theorem answers 
this question in the affirmative: 

Theorem 1. There is a randomized polynomial-time approximation algorithm 
for the covering Steiner problem which, with high probability, produces an ap- 
proximation of: (i) 0{{logN) ■ (log/c)) for tree instances, and (ii) 0((logn) • 
(log A^) • (logfc)) for general instances. 

This implies an improvement of 0((logn)/loglogn) can be obtained over 
previous results in some situations; indeed, this is achieved when, say, k = 
log^’*’®*'^^ n and K = (The reason we take k ^ log^ n in this example 

is that the problem on trees is easily approximable to within k; so, if k were 
small, we would have a good approximation algorithm anyway.) 

The bounds for tree instances are essentially the best possible in the following 
asymptotic sense: the paper of Halperin and Krauthgamer [7] shows that, for any 
constant e > 0, an 0((log(n-|- fc))^“*^)-approximation algorithm for the Covering 
Steiner problem implies that NP C ZTIME[exp{{logn)'^^^^)]. Furthermore, 
we adopt a natural linear programming (LP) relaxation considered by Garg et 
al. [5] and Konjevod et al. [9]; it has been shown that the integrality gap of this 
relaxation for tree-instances of Group Steiner is i7((logfc)-(log A^)/loglog A^) [6]. 

1.1 Our Techniques 

Our approach to solve the Covering Steiner problem is to iteratively round an 
LP relaxation of the problem suggested by Konjevod et al. [9]. Given a partial 
solution to the problem, we consider the fractional solution of the LP for the cur- 
rent residual problem, and extend the partial solution using either a randomized 
rounding approach and a direct deterministic approach. 

Informally, in order to do better than the approach of Konjevod et al. [9], the 
main technical issue we handle is as follows. Let OPT denote the optimal solution 
value of a given tree instance. In essence, each iteration of [9] constructs adds an 
expected cost of O(0PT • log A^) to the solution constructed so far, and reduces 
the total requirement by a constant fraction (in expectation). Since the initial 
total requirement is at most k ■ K, we thus expect to run for 0{logk -\- log AT) 
iterations, resulting in a total cost of O(0PT • (log fc -I- log AT) log A^) with high 
probability. To eliminate the log K term from their approximation guarantee, 
we show, roughly, that every iteration has to satisfy one of two cases: (i) a 
good fraction of the groups have their requirement cut by a constant factor 
via a threshold rounding scheme, while paying only a cost of O(OPT), or (ii) a 
randomized rounding scheme is expected to fully cover a constant fraction of 
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the groups. A chip game presented in Section 2 is used to bound the number of 
iterations where case (i) holds; Janson’s inequality [8] and some deterministic 
arguments are used for case (ii). We do not attempt to optimize our constants. 

2 A Chip Game 

Consider the following chip game. We initially have chips arranged in k groups, 
with each group i having some number < K oi chips. Chips are iteratively 

removed from the groups in a manner described below; a group is called active 
if the number of chips in it is nonzero. The game proceeds in rounds as follows. 
Let Ui denote the number of chips in group i at the start of some round. Letting 
A denote the number of groups currently active, we may choose any set of at 
least [A/2] active groups; for each of these chosen groups i, we remove at least 
|"ni/2] chips, causing the new number of chips in group i to at most [rii /2J.) 
Informally, we choose roughly half of the currently active groups, and halve their 
sizes. (In the analysis below, it will not matter that these two constants are 1/2 
each; any two constants a, 6 G (0, 1) will lead to the same asymptotic result.) 

Due to these chips being removed, some groups may become inactive (i.e., 
empty); once a group has become inactive, it may be removed from consideration. 
The game proceeds until there are no chips left. The following lemma will be 
useful later. 

Lemma 1. The maximum number of rounds possible in the above chip game is 
0{{\ogk)-{\ogK)). 

Before we prove this lemma, let us note that this bound is tight, and 
0((logA:) • (log AT)) rounds may be required. Suppose all the groups start off 
with exactly K chips. We first repeatedly keep choosing the same set of \k/2] 
groups for removing chips, and do this until all these groups become inactive; 
we need 6>(log AT) rounds for this. We are now left with about k/2 active groups. 
Once again, we repeatedly keep removing chips from the same set of about fc/4 
active groups, until all of these become inactive. Proceeding in this manner, we 
can go for a total of 0((logA:) • (log AT)) rounds. 

Proof (Lemma 1 ). To bound the number of rounds, we proceed as follows. We 
first round up the initial number of chips n^*”**^ in each group to the closest power 
of 2 greater than it. Next, we may assume that in each round, each chosen active 
group i has its number of chips reduced by the least amount possible; i.e., 
to exactly [ni/2j. This ensures that each active group always has a number of 
chips that is a power of two. We can now modify the game into an equivalent 
format. We initially start with 1 + logn^*"**^ chips in each group i; once again, a 
group is active if and only if its number of chips is nonzero. Now, in any round, 
we choose at least half of the currently-active groups, and remove one chip from 
each of them. Note that this simple “logarithmic transformation” does not cause 
the maximum number of rounds to decrease, and hence we can analyze the game 
on this transformed instance. Let N\ be the number of rounds in which at least 
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kj2 groups are active. In each of these rounds, at least /c/4 chips get removed. 
However, since the total number of chips is at most fc(l + log A"), we must have 
-^1 < 4(1 + log A"). Proceeding in this manner, we see that the total number of 
rounds is 0((logA:) • (log AT)), as claimed. 

3 Algorithm and Analysis 

As mentioned in the introduction, we will assume that the input consists of a 
rooted tree, where the root r must be included in the output solution. Further- 
more, we make the following assumptions without loss of generality: every vertex 
belonging to a group is a leaf, and every leaf belongs to some group. 

3.1 The Basic Approach 

As in previous papers on the Covering Steiner and Group Steiner problems, 
the broad idea of the algorithm is to proceed in iterations, with each iteration 
producing a subtree rooted at r that provides some coverage not provided by 
the previous iterations. This process is continued until the required coverage is 
accomplished, and the union of all the trees constructed is returned as output. 

Consider a generic iteration; we will use the following notation. Let denote 
the residual requirement of group gi; call gi active if and only if r' > 0, and let 
k' be the number of groups currently active. The leaves of gi already covered 
in previous iterations are removed from gi] abusing notation, we will refer to 
this shrunk version of the group gt also as gi. All the edges chosen in previous 
iterations have their cost reduced to zero: we should not have to “pay” for them 
if we choose them again. For any non-root node u, let pe(rt) denote the edge 
connecting u to its parent; for any edge e not incident on the root, let pe(e) 
denote the parent edge of e. Finally, given an edge e = (rt, v) where u is the 
parent of v, both T{v) and T{e) denote the subtree of G rooted at v. 

The following integer programming formulation for the residual problem was 
proposed by Konjevod et al. [9], in which there is an indicator variable for 
each edge e, to say whether e is chosen or not. The constraints are: (i) for any 
active group gi, ^pe(j) = ^$5 (ii) for edge e and any active group gi, 

J2j^(T{e)ngi) ^ps(j) — and (iii) for any edge e not incident on the root r, 

Xpe{e) ^ Xe- Finally, the objective is to minimize CeXe- By allowing each Xe to 
lie in the real interval [0, 1] as opposed to the set {0, 1}, we get an LP relaxation. 
Since chosen edges have their cost reduced to zero for all future iterations, it 
is easy to see that the optimal value of this LP is a lower bound on OPT, the 
optimal solution value to the original Covering Steiner instance. 

Each iteration will start with the residual problem, and solve the above LP 
optimally; it will then round this fractional solution as described below in Sec- 
tion 3.2. An interesting point to note is that as pointed out in [9], the integrality 
gap of the relaxation considered can be quite large: when we write the LP relax- 
ation for the original instance, the integrality gap can be arbitrarily close to K. 
Hence it is essential that we satisfy the requirements partially in each iteration, 
then re-solve the LP for the residual problem, and continue. 
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3.2 Rounding 

We will henceforth use {xe} denote an optimal solution to the LP relaxation; we 
now show how to round it to partially cover some of the residual requirements. 
In all of the discussion below, only currently active groups will be considered. 
For any leaf j, we call XpeQ) the flow into j. Note that the total flow into the 
leaves of a group gi is r'. For a > 1, we define a group gi to be (p, a)-covered if 
a total flow of at least pr' goes into the elements (i.e., leaves) of gi which receive 
individual flow values of at least 1/a. An a-scaling of the solution {xe} is the 
scaled-up solution {Xe} where we set Xe ^ minjaxe, 1}, for all edges e. 

The iteration is as follows. Define Gi be the set of active groups that are 
(1/2, 4) -covered. We will consider two cases, based on the size of Qi. 

Case I: \Gi\ > k' /2. In this case, we simply perform a 4-scaling, and pick the 
edges that are rounded up to 1. The solution returned in this case is just the 
connected subtree consisting of the picked edges that contains the root. Note that 
every group in Gi has at least half of its requirement covered by this process. 
Indeed, since any such group gi is (1/2, 4)-covered, it must have at least r'/2 of 
its member leaves receiving flows of value at least 1/4. Thus, by this rounding, 
at least half of the currently active groups have their requirements reduced by 
at least half. Now Lemma 1 implies that the total number of iterations where 
Case I holds is at most 0((log k) ■ (log K)). We pay a cost of at most 4 • OPT in 
each such iteration, and hence 

cost(Case I iterations) = 0((logA:) • (log AT) • OPT). (1) 

(Let us emphasize again that throughout the paper, OPT refers to the cost 
of the optimal solution for the original Covering Steiner instance.) Now suppose 
Case I does not hold, and thus we consider: 

Case II: \Gi \ < k' /2. In this case, let A = c' log N, for a sufficiently large absolute 
constant c'. Let G 2 be the set of active groups that are not (3/4, A)-covered, and 
let Gs be the set of active groups that do not lie in Gi U f/ 2 - 

We now use a modification of the rounding procedure used previously by Garg 
et al. [5] and Konjevod et al. [9]. For every edge e, define = min{A • Xg, 1} to 
be the solution scaled up by A. For every edge e incident on the root, pick e with 
probability x'g. For every other edge e with its parent denoted /, pick e with 
probability x'^jx'j. This is performed for each edge independently. Let H denote 
the subgraph of G induced by the edges that were picked, and let T denote the 
connected subtree of H containing the root r; this is returned as the solution in 
this case. Crucially, we now set the costs of all chosen edges to 0, so as to not 
count their costs in future iterations. It is easy to verify that the probability of 
the event e G T for some edge e is x(.; now linearity of expectation implies that 

E[cost(T)] < A • OPT. (2) 

We next analyze the expected coverage properties of this randomized round- 
ing. For any group gi, let g[ be the subset of elements of gi which have individual 
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in-flow values of at most 1/A, and let fi denote the total flow into the elements 
of g[. We want a lower bound on Xi, the number of elements of g' that get 
covered by the above randomized rounding; for this, we plan to employ Janson’s 
inequality [8]. 

It is easy to see that = E[ATi] = /^A. Suppose j,j' G g- are two leaves in g[. 
We say that j ~ j' if and only if (i) j yf j' and (ii) the least common ancestor 
of j and j' in G is not the root r. If j ~ j' , let lca(j, j') denote the least common 
ancestral edge of j and j' in T'. To use Janson’s inequality, define 

^ ^pe(j)^pe(i') 

If fi > ■'’i/4, then Janson’s inequality shows that if c' is large enough, then 

Pr|X.>r']>l-exp(-fl(^^-^)). (3) 

Furthermore, a calculation from [9] can be used to show that Ai < 
0{{r[Y \og^ N). Plugging this into (3), along with the facts that Hi = fiX, and 
that A = c'log A^, it follows that there is an absolute constant c" G (0, 1) such 
that 



if fi > r'j4,, then Pr[A:i > r'] > c" . (4) 

Recall that G 2 consisted of all the groups gi which were not (3/4, A)-covered. 
Hence, at least r'/4 flow into gi goes into leaves with in-flow at most 1/A, and 
hence fi is at r'/4. This satisfies the condition in (4), and hence the expected 
number of groups in G 2 that get all of their requirement covered in this iteration 
is at least c" ■ \G 2 \- 

Next consider any g* G G 3 ', by definition, g^ must be (3/4, A)-covered, but not 
(1/2, 4) -covered. In other words, if we consider the leaves of gi whose individual 
flow values lie in the range [1/A, 1/4], the total flow into them is at least (3/4 — 
l/2)r' = r'/4. Hence there are at least r' such leaves in the group gi. Finally, 
since all these leaves have individual flow values of at least 1/A, all of them get 
chosen with probability 1 into our random subtree T, and hence every group in 
Gs has all of its requirement satisfied with probability 1. 

To summarize, the expected number of groups whose requirement is fully 
covered, is at least 

C" ■\G 2 \ + \G^^ 2 \ > C" ■ \G 2 \ + C" • \GGJW 2 \ > C" • 1^1 > C" fc'/2, 

the last inequality holding since we are in Case II and Gi > k'/2. Hence each 
iteration of Case II is expected to fully satisfy at least a constant fraction of the 
active groups. It is then easy to see (via an expectation calculation and Markov’s 
inequality) that for some constant a, the probability that not all groups are 
satisfied after a log A: iterations where Case II held, is at most 1/4. 
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Also, summing up the expected cost (2) over all iterations (using the linearity 
of expectation), and then using Markov’s inequality, we see that with probability 
at least 1/2, 

cost(Case II iterations) < 0((logfc) • (logA^) • OPT). (5) 

Combining (I) and (5) and noting that K < N, we get the proof of Theorem I. 

4 Open Questions 

It would be nice to resolve the approximability of the Group Steiner and Cov- 
ering Steiner problems on general graph instances. Near-optimal approximation 
algorithms that are fast and/or combinatorial would also be of interest. 
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Abstract. A spatial logic consists of fonr groups of operators: stan- 
dard propositional connectives; spatial operators; a temporal modality; 
calculus-specific operators. The calculus-specific operators talk about the 
capabilities of the processes of the calculus, that is, the process construc- 
tors through which a process can interact with its environment. We prove 
some minimality results for spatial logics. The main results show that in 
the logics for 7r-calculus and asynchronous yr-calculus the calculus-specific 
operators can be eliminated. The results are presented under both the 
strong and the weak interpretations of the temporal modality. Our proof 
techniques are applicable to other spatial logics, so to eliminate some of 
- if not all - the calculus-specific operators. As an example of this, we 
consider the logic for the Ambient calculus, with the strong semantics. 



1 Introduction 

Over the last 15 years, a lot of research has gone into calculi of mobile pro- 
cesses. Among these, the Tt-calculus is the best known. A number of other calculi 
have been put forward to study aspects of mobility not directly covered by the 
TT-calculus. Examples are: the Asynchronous tr-calculus (Att), which uses asyn- 
chronous communications, that are more common in distributed systems than 
the synchronous communications of the 7r-calculus; the Ambient calculus and all 
its variants, which extend the 7r-calculus with localities and movements of these. 

At present, one of the most active research directions in the area of process 
mobility is that of spatial logics [2, 3, 5, 6]. These logics are used to reason about, 
and express properties of, systems of mobile processes. The logics can describe 
both the spatial distribution of the processes and their temporal evolutions. A 
spatial logic consists of four groups of operators: standard propositional con- 
nectives; spatial operators; a temporal modality; calculus-specific operators. We 
briefly comment on them below. 

The spatial operators allow us to express properties of the structure of a 
process. These operators include tensor, |, and linear implication, >. The former 
is used to separate a spatial structure into two parts: thus a process satisfies 
formula A\ \ A 2 if the process can be decomposed into two subsystems satisfying 
respectively Ai and A 2 . Operator t> is the adjunct of |: a process satisfies formula 
Ai > A 2 if, whenever put in parallel with a process satisfying Ai , the resulting 
system satisfies A 2 . Other spatial operators are the revelation operator ® and 
the freshness operator, H; a combination of these operators give us the logical 
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counterpart of restriction, the construct used in calculi of mobile processes to 
create fresh names. Note that as an alternative to M, the standard universal 
quantifier on names, which is more powerful than M, can be included in £. This 
is not needed for our purposes (except in Section 4 — see below) . 

The temporal modality, O, can be interpreted strongly or weakly: in the 
former case the number of reductions a process can perform is visible, in the 
latter case it is not. Calculus-specific operators talk about the capabilities of the 
processes of the calculus, that is, the process constructs through which a process 
can interact with its environment. In the 7r-calculus, input and output are the 
only capabilities. The output capability is the only primitive operator in the 
spatial logic for the 7r-calculus [2], the input capability formula being derivable. 

An important property of a formalism is conciseness: the formalism should 
have a small set of independent operators. Conciseness helps when developing 
the theory of the formalism and when studying its expressiveness. We call a result 
that reduces the number of operators - in a logic, or in a calculus - a minimality 
result, in the sense that it helps going in the direction of a minimal language. This 
terminology should not be misunderstood: we are interested in getting a smaller 
language, without necessarily proving that we obtain the smallest possible one. 
A minimality result can be useful in tools and implementations. For instance, 
the possibility of encoding the operator of sum in the 7r-calculus [9] justifies its 
absence in Piets’ abstract machine [10]. 

In this paper we prove some minimality results for spatial logics. Our main 
results show that, surprisingly, in the logics for 7r-calculus and Att all calculus- 
specific operators can be eliminated. These results hold both under the strong 
and under the weak semantics for O. The resulting common core spatial logic, 
C, has the following grammar: 

A ::= Ai /\A2 I ^A I 0 I I A2 I > A2 I n@A | An. A \ OA . 

Note that the operators of this logic give no information about the nature of 
computation (whether it is based on synchronisation, what values are exchanged, 
etc.). Further, it may be puzzling to see the same logic - same operators, same 
interpretation - for 7r-calculus and Att, because their behavioural theories are 
rather different. The point is that spatial logics are rather intensional, and do not 
agree with the standard behavioural theories. These logics allow us to observe 
the internal structure of the processes at a very fine-grained detail, much in the 
same way as structural congruence does [11,7]. 

We do not claim that the common core logic £ is universal, i.e., that it 
can be used on many or all calculi of mobile processes. We think that, usually, 
some calculus-specific operators are necessary. However we believe that our proof 
techniques are applicable to other spatial logics, so to eliminate some of - if not 
all - the calculus-specific operators. As an example of this, we consider also the 
case of Ambient-like calculi, under the strong semantics. The spatial logics for 
Ambients [5,6] have two calculus-specific operators, called ambient and ambient 
adjunct. We can derive some of, but not all, the capability formulas of Ambients 
in C. We can derive all of them if we add the ambient adjunct to £: thus the 
ambient formula can be eliminated from the logic in [5] . 
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Our results suggest that spatial logics are more expressive than standard 
modal logics. The latter do not have the spatial connectives. However, these 
logics have more precise temporal connectives, the modalities. The only modality 
of the spatial logics talks about the evolution of a system on its own. In standard 
modal logics, by contrast, modalities also talk about the potential interactions 
between a process and its environment. For instance, in the Hennessy-Milner 
logic the modality (a) . A is satisfied by the processes that can perform the action 
a and become a process that satisfies A. The action a can be a reduction, but also 
an input or an output. The formulas for the modalities are similar to those for 
the capabilities discussed above; in general, in a spatial logic, they are derivable 
from the capability formulas. (In the paper we focus on the capability formulas, 
since they are more important in a spatial logic.) 

For lack of space we do not present all the details of the proofs, and illustrate 
them only in the case of the 7r-calculus (detailed justifications can be found 
in [8]). When characterising a capability construct, we exploit operator > to 
build a scenario that allows the capability to show its effect. This approach 
works rather smoothly under the strong interpretation of O, the constructions 
becoming more involved under a weak interpretation. In the latter case, we rely 
on some non-trivial properties, specific to each calculus we consider, to isolate 
some kind of composite constituents of interactions, that we call threads. In tt, 
threads are sequences of input and output prefixes, while in Ambients they are 
sequences of nesting of open prefixes and ambients. The use of operators ® and 
M is crucial to derive formulas characterising threads. 

Paper outline. In Section 2, we introduce our core spatial logic. We then present 
our results on the 7r-calculus (Section 3), and explain how we derive them. We 
also mention, in less detail, our results on the asynchronous 7r-calculus and on 
Mobile Ambients in Section 4. Section 5 gives some concluding remarks. 

2 A Minimal Spatial Logic 

A spatial logic is defined on spatial calculi, that is, calculi of processes that have 
the familiar constructs of parallel composition, restriction, and 0. These calculi 
are equipped with the usual relations: structural congruence, = (with the usual 
axioms for parallel composition, restriction, and 0), the one-step reduction rela- 
tion — >, and the multistep reduction relation — >* (the reflexive and transitive 
closure of — >■). Restriction is a binder, therefore notions of free and bound names 
of processes are also defined (for each process, the sets of free and bound names 
are finite). fn(P) stands for the set of free names of P. For two sets 5'i,S'2 of 
names. Si \ S 2 stands for the set of names that belong to Si and not to S' 2 . 

We give some properties, definitions, and notations for spatial calculi. We use 
P,Q, . . . to range over the processes, and a, b,n,m, . . . to range over the infinite 
set Af of names. A process P 

— has normalised restrictions if, for any occurrence of a subterm (izn) P' in P, 

name n occurs free in P' (that is, P has no useless restriction); 
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~ has a toplevel normalised restriction ii P = (un) P' and n G fn(P'). 

— is non-trivial if it is not structurally congruent to 0; 

— is tight if, up to structural congruence, it only has one component; in other 
words, it is non-trivial and is not the composition of two non-trivial processes. 
For example, in the 7r-calculus, a{n).b{n) and {va){d{h) \ a{n).c{n)) are 
tight, while c(a) | h{d) is not. 

As a consequence of the axioms of structural congruence for restriction, n G 
fn(P) holds iff P ^ P' for all P' . A (possibly empty) sequence of restric- 
tions will be written {i>h) P. 

Definition 1 (Core spatial logic £). Formulas of the spatial logic C, ranged 
over by A, B, are defined by the following grammar: 

A ::= A\ /\ A\ I ~^A I 0 I I A-2 I i> A 2 \ n® A | Hn. A | <>A . 

The set of free names of a formula A (written fn(A)) is defined by saying 
that the only binding operator is H. We write A{n gg m) for the permutation 
of names n and m in formula A. Given a spatial calculus C, the temporal con- 
struct of the logic can be interpreted both strongly, that is, using the one-step 
reduction relation of the calculus, or weakly, that is, using multistep reduction. 
We write P A for the strong interpretation of A, and P A for the weak 
interpretation. We use sw as a variable that ranges over {s,w}. 



Definition 2 (Satisfaction in the spatial logic). On a spatial calculus C, 
satisfaction is defined by induction over the formulas as follows: 

- P Ai AA2ifP hr and P hr “ 42 / 

- p hr ^ ^r 

- p hr 0 ifP = 0 ; 

- P hr -^1 I -^2 if there are Pi, P2 s.t. P = P\ \ P2 and P^hr-^* ~ 

- p hr ^1 0^2 if for all Q s.t. Q hr - 4 i, P | Q hr - 42 / 

- P hr if there is P' such that P = {un)P' and P' hr 

- P hr if for any n' G Af\ (fn(P) U fn(A)), P hr ^ ^ 0 / 

- P he ^-4 if there is P' such that P — > P' and P' he A (we write this 
P ^P'^l A); 

- P he if there is P' such that P — >* P' and P' he (we write this 

p p'hc-4;. 



Figure 1 presents some known formulas. V and T are ‘or’ and ‘true’ oper- 
ators; □ is the ‘always’ operator, the dual of O; and ► is the dual of > (thus 
P hr -^1 ^ -^2 if there is Q such that Q hr -^1 P I Q -^ 2 )- The mod- 
els of formula 1 are tight terms. 2 is satisfied by the parallel composition of two 
processes satisfying 1. Formula Free(a) says that a occurs free. We sometimes 
write Free(a, 6) to abbreviate 



^Free(a) A ^^Free{b) for a G a and b G b. 
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■4i V .42 = “^(^.41 A - 1 . 42 ) n.4 = -'0-1.4 .4i ► .42 = -■(.4i > - 1 . 42 ) 

T 0 V ^0 1 -lO A ^ (^0 I ^0) 2 (^0 I ^0) A -i(-iO | ^0 | ^0) 

Pree(a) -i o®T public -iHa. a®(-ia®T) single = 1 A public 



Fig. 1. Some spatial formulas 



that is, names a are free, and b are not. The models of public are the processes 
that are not structurally congruent to a process of the form (un) P with n G 
fn(P), in other words processes having no toplevel normalised restriction. We call 
such terms public processes. Finally, the models of single are those processes 
that are tight and do not exhibit a normalised restriction at toplevel. We call 
these the single processes. In the 7r-calculus, they are prefixed terms (Lemma 2). 

3 The Logic in the 7r-Calculus 

3.1 The Process Calculus 

Definition 3. This is the grammar for the processes of the -calculus: 

P ::= 0 I Pi I P 2 I (t^n) P | !P | a.P, a ::= a{b) | a{b) . 

Here and in the remainder of the paper, we sometimes call this the synchronous 
TT-calculus, to distinguish it from the asynchronous 7r-calculus studied below. 

The subject of a prefix m{a). P or m{a). P is m. We omit the trailing 0 in a. 0. 
The set of free names of a process P is defined by saying that restriction and input 
are binding operators. We write P{b -a} for the capture-avoiding substitution of 
name b with name a in P. Figure 2 presents the structural congruence and 
reduction relations for the 7r-calculus. 

Definition 4. A thread is a process given by the following grammar: 

Thr ::= a.Thr \ a.O (a is as by Definition 3). 

If d is a (possibly empty) sequence of prefixed actions, such as «i. 02 Qf„, 

and P is a thread a. P', then P can perform actions a and then become P'. We 
indicate this using the notation P P' . We define a dualisation operation 
over prefixes by setting a{b) a{b) and a{b) a(b). This induces a similar 
operation P on processes. 

Lemma 1 (Properties of 7r-calculus reductions). 

1. Let P,Q be single and s.t. P \ Q — > 0; then there are names a,b s.t. either 
P = a{b).0 and Q = a(b).0, oj^vice versa (Q = d(b).0 and P = a(b).0j. 

2. For any thread P we have P \ P — >* 0. 

We write for the satisfaction relations in the synchronous 7r-calculus (we 
recall that sw ranges over {s,w}). 
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P \ 0 = P P \ Q = Q \ P P \ {Q \ R) = {P \ Q) \ R 
\P = \P \ P \{P \ Q) = \P \ \Q !!P = \P !0 = 0 
{i>n)0 = 0 (un){um)P = (vm){vn)P {yn){P IQ) = P | {un)Q if n ^ fn(P) 

a{h).P I a{c).Q — s- P{b~c} \ Q 

P = P' P — y Q Q = Q' P — > P' P — > P' 

P' — > Q' P I Q — > P' \ Q (i/n) P — > (i/n) P' 



Fig. 2. TT-calculus: structural congruence and reduction 



3.2 Main Results 

We show the derivability of the logical formulas for the capabilities of the tt- 
calculus (the input and output prefixes), as expressed by the following theorems: 

Theorem 1 (Capabilities, strong case). For any A,n,m, there exist formu- 
las in^{m,n).A (with n yf m) and out®(m, n). .4 such that, for any P: 

P 1=^ ixi{m,n).A iff there are P',n', n' ^ fn(.4). s.t. P = m{n').P' and 
P' h: An ^ n'); 

- P out^(m, n). .4 iff there is P' s.t. P = m{n).P' and P' A. 

Theorem 2 (Capabilities, weak case). For any A,n,m, there exist formulas 
in”(m, n). A (for nf^m) and out"(m, n). A such that, for any P: 

- P in”(m, n). .4 iff there are P',P",n',n' ^ fn(.4), s.t. P = m(n').P' 
and P' P" \=lA{n O n'); 

- P out”(m, n). .4 iff there are P',P" s.t. P = fn{n).P' and P' — 

p"KA. 

These formulas easily allow us to define, in the strong and weak cases: 

— the characteristic formulas for finite terms w.r.t. logical equivalence; 

— the modality formulas for the input and output actions. For instance, in the 
weak case, the formula for the output modality (d{b)').A is satisfied by any 
process that is liable to perform some reduction steps, emit name b along a, 
and then perform some other reductions to reach a state where A is satisfied. 

We do not present these constructions in detail, because either they are variations 
on existing work [7], or they are simple on their own. 

3.3 Proofs 

We sketch the proofs of Theorems 1 and 2. We consider the strong case first, 
since it is (much) simpler. The following formula will be useful: it shows that 
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in the 7r-calculus the single processes are precisely the prefixed terms. The crux 
of the proof of the theorems, however, especially in the weak case, will be the 
definition of formulas to distinguish among different prefixes (whether the prefix 
is an input or an output, which channels it uses, what is its continuation, etc.). 

Lemma 2. For any P, P 1=*“ single iffP = m{n). P' or P = m{n). P' for 
some m, n, P' . 

The strong case. In the strong case, the key formula is the following one. 



test(m, n) 



= Free(m, n) A (single ► OO) 



Proposition 1. For any P,n,m such that n ^ m, P |=^ test(m,n) iff 
P = m{n) or P = n{m) . 

Proof. We focus on the direct implication, the reverse direction being trivial. 
By Lemma 1, P is either of the form a{b). 0 or a{x). 0. Having two distinct free 
names, P must be an output particle. 

We can now define, using test(m, n), the formulas of Theorem 1: 



\Yp{m,n).A = single A Hn. (test(m,n) ► O .4) 

out^(m, n). A single A Am'. (in^(m, a). test(m', a) > 0(test(m', n) | .4)) 



The formula for in^ (m,n) requires a process to be single, and moreover 
the prefix should disappear in one step when in parallel with a certain tester 
test(m, n). The formula for out®(m, n). .4 is similar, exploiting the previous for- 
mula in^(m, n). .4; a test test(m',a) is required to observe the emitted name; 
this name instantiates a and is different from m' by construction. 



The weak case. We first introduce formulas to isolate threads. This is achieved 
using a ‘testing scenario’, where the candidate process for being a thread is tested 
by putting in parallel a tester process. The latter should consume the tested part 
in a reduction sequence along which no more than two single components are 
observed. A subtle point is the ability, along the experiment, to distinguish the 
tested from the tester; we use the following property of 7r-calculus reductions: 

Lemma 3. Suppose that the following conditions hold: P, are single 

processes such that P \ Q — > R\ \ R 2 , and there exist two distinct names n 
and m s.t. {m, n} C fn(P) \ fn((5), {m,n} C fn(i?i) \fn(i? 2 )- Then there exists 
a prefix a s.t. P = a. R\ and Q = a. R 2 . 
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In our case, the tester process will be identified using two distinct names 
(n, m below), that do not appear in the tested process, and that act as markers. 



tested(m,n) = single A Pree(-i to, - i n) 
tester(TO, n) single A Free(TO, n) 

dial(TO, n,Al) 0(tester(TO, n) | A) A □ (tester(m, n) | (tested(TO,n) V Al)) 



The formula dial (for dialog) is supposed to be satisfied by the composition 
of the tester and the tested processes. Intuitively, dial requires that the compu- 
tation leads to a state where the tested process ‘has disappeared’, and that at 
any moment along the computation either the tested process is still present, or 
formula A is satisfied. (This actually does not prevent the tester process from 
‘playing the role of the tested’, once the tested has been consumed, but this will 
be of no harm for our purposes.) 

For m,n,A fixed (to yf n), we say that (P,Q) is a pair tested/tester for 
A, and write this P Q, if P tested(TO, n) \/ A, Q |=” tester (to, n), and 
P I Q |=” dial(TO, n, A). The following technical lemma, whose proof is based on 
Lemma 3, describes the execution scenario: as long as Al is not satisfied, the tested 
process cannot contribute to the ‘tester’ component; that is, it cannot fork into 
two components one of which is used for the satisfaction of a ‘tester’ subformula 
of dial. We write (P,Q) — {P' ,Q') if either P = a{x).Pi, Q = d{b).Q', 
P I Q — ^ P' I and P' = Pi{b/x}, or the symmetric configuration. 

Lemma 4. Assume P Q for some m,n,A. Then: 

1. either {P,Q) — )> {Pi,Qi \ Q 2 ) for some Pi,Qi,Q 2 , and Pi \ Qi 

2. orP^lA. 

I 7T 

In the above scenario, taking ^ = 0 amounts to say that the tested process 
can disappear. We use this fact to define characteristic formulas for the threads. 

Lemma 5. Take A = Q in Lemma 4- For any single P s.t. {to, n} fl fn(P) = 0, 
there is Q s.t. P Kq Q iff P is a thread. 

Proof. The proof relies on the fact that the scenario prevents the tested process 
from forking into two subcomponents. 

We now define the formula that captures threads. We also need two auxiliary 
formulas to express existential and universal properties of suffixes of threads. 



Thread = single A lAm,n. tester(TO,n)^ dial(TO,n, 0) 

((_)). Thread AHTO,n. (Thread A tester(TO, n)) ► dial(TO, n, Al) 
[[_]].yl=^ Thread A -((_)). -^ 
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Lemma 6. The formulas above have the following interpretation: 

^ P Thread iff P is a thread. 

- P |=” ((_)). ^ iffP is a thread such that there are P' and some a with P P' 
and P' A. 

I 7T 

- P [[-]]• -4 iff P is a thread s.t. whenever P P' for some P' , a, P' |=” A. 

Proof. The first property is a direct consequence of Lemma 5. The other two are 
proved by induction on the size of the thread process being considered. 

A thread ends with a if a is the last prefix in the thread. A thread is located at 
m if the subject of all its prefixes is m. We now refine our analysis of threads, by 
introducing formulas that isolate located threads that end with a special prefix. 



Barb(m) = single A ((single A -iPree(m)) > □ 2) . 

EndO(m, n) [[_]]. (O V (Barb(m) A Free(n))) 

Endl(m) =*' Hn. [[_]]. (O V (EndO( m, n) ► OO)) 

OutOnly(m, n) '=^ EndO(m, n) A Hn'. [[_]]. (EndO(m,n') > □-'EndO(TO,n)) 



Formula Barb(m) captures single terms whose initial prefix has subject m. 
This is obtained by requiring that such processes cannot interact with single 
processes that do not know m. Formula EndO(m,n) is satisfied by located 
threads that end with the particle m{n) . With a ‘dualisation argument’ we define 
Endl(n) in terms of EndO(m, n). Finally, formula OutOnly(m, n) captures the 
processes that satisfy EndO(m,n) and that have no input prefix. For this, we 
require that such processes are not able to ‘consume’ a thread having at least 
one output (cf. the EndO(m,n') subformula). 

Lemma 7. The formulas above have the following interpretation: 

— P 1=” Barb(m) iff P = m{n). P' or P = Wi{n). P' for some n and P' . 

— For n m, P |=” EndO( TO, n) iff P is a thread located at to ending with 
fh{n) with n not bound in P. 

— P |=” Endl(TO) iff P is a thread located at to and ending with m{x) . 

— For n yf TO, P OutOnly(TO, n) iffPisoftheformm{ci)...m{cf).m{n) 
for some r > 0 and {ci)i<i<r. 

The last important step before defining the formula for Theorem 2 is the 
definition of some formulas that characterise certain ‘particle’ terms (Lemma 8). 
We use the same notation for these processes and the corresponding formulas: 



m{n) = OutOnly(TO,n) A (EndI(?Tt) > EndO(TO, n)) 
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m(n) = Thread A 
m{n).p{q) Thread A 
m{n).p{q) Thread A 



Hn. (jn{n) > OO) 
Hn. (fn{n) ><>p{q)) 
(m{n) t> Op{q)) 



The definition of formula m{n) imposes that a process satisfying formula 
OutOnly(m, n) has only one prefix: if this is not the case, then there exists a 
‘Endl(m) process’ that can be consumed, leading to a EndO(m, n) term. 

Lemma 8. Let A be one of the formulas above, where m,n,p,q are distinct 
names, and be the corresponding term. Then for any P, P A iff P = Qj[- 



out"(m,n).^ = Ap,q. ( m{n).p{q) > ^{p{q) \ A) ) 
in''{m,n). A An,p,q. ( fn{n).p{q) t> 0{p{q) \ A) ) 



In both cases a flag process p{q) is used to detect when the ‘revealing’ reduc- 
tion step has occurred, since in the weak semantics the number of reductions is 
not observable. 

4 The Logic in Other Calculi 

The constructions we have shown above can be adapted to obtain similar results 
for two calculi that are quite different from the (synchronous) 7r-calculus: the 
asynchronous 7r-calculus and Mobile Ambients. 

When moving to a different language, we cannot directly apply the formulas 
and the proofs examined on the 7r-calculus. The main reason is that the syntax 
changes, which affects our constructions, for instance when a formula Ai > A 2 
is used (operator > talks about contexts of the calculus). Indeed, there are for 
example processes that cannot be distinguished in the logic (i.e., they satisfy the 
same sets of formulas of C) when they are taken as processes of Att, but that 
can be distinguished when they are taken as processes of the 7r-calculus. This is 
the case for instance for processes a{x). {d{x) \ a{x).P) and a{x). P. 

In this section, we present our results on Att and on Mobile Ambients. For lack 
of space, we only briefly hint on some aspects of the derivation of the formulas. 
Detailed constructions and proofs are available in [8]. 

4.1 Results in the Asychronous tt- C alculus 

In Att there is no continuation underneath the output prefix: 

P ::= 0 I Pi I P 2 I (i^n) P | !P | n(m). P | n{m) . 

We omit the resulting modifications to the operational semantics. We write HXr 
for the satisfaction relations in Att. Below are the main results for Att, showing 
the derivability of the capability formulas for the strong and the weak semantics. 
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Theorem 3 (Capabilities, strong case). For any A,n,m, there exist formu- 
las \rF{m,n). A (with the additional requirement n ^ m) and out®(m,n) such 
that, for any P: 

- P 1=^^ 'ui{m,n). A iff there are P',n', n' ^ fn(.4). s.t. P = m{n').P' and 
P' hL An ^ n'); 

- P Hatt out^(m,n) iffP = m{n). 

Theorem 4 (Capabilities, weak case). For any A,n,m, there exist formulas 
in”(m, n). A (for n ^ m) and out(m, n) such that, for any P: 

- P\=\^in^{m,n).A iff there are P',P",n', n' ^ fn(.4) s.t. P = m{x).P' 
and P I m{n') — >* P" A{n O n'); 

^ PAa^ out'' {m,n) iff P = m{n). 

As a consequence of Theorem 3, the output capability can be removed from the 
logic in [2]. To derive the capability formulas in Att, we proceed in a way that 
is quite similar to what we did in the 7r-calculus. It turns out that asynchrony 
actually simplifies our proofs: certain constructions become simpler, and certain 
results sharper. For instance, formula Thread from Section 3 directly captures 
output particles (intuitively because output particles cannot play the role of the 
tester in the scenario explained above). 

4.2 Results in Mobile Ambients 

The calculus of Mobile Ambients [4] is a model where the basic computational 
mechanism is movement, rather than communication as in tt. The calculus of [4] 
also includes communications, which for simplicity we have omitted here. 

Definition 5. The grammar of Mobile Ambients is the following: 

P ::= 0 I Pi I P 2 I (i^n) P | !P | n[P] | a.P, a ::= open n | in n | outn. 

The structural congruence rules for mobile ambients are the same as in tt, 
plus the following rule to allow restrictions to cross ambient boundaries: 

n[{v>m) P] = (izm)n[P] if nyf m. (1) 

Instead of tt’s communication rule, we have the following rules for ambients: 

open n.P | n[Q] — ^ P \ Q 
m[in n. P \ Q] \ n[R] — n[m[P | Q] | P] 
n[m[outn. P | Q] | P] — >■ n[R] \ m[P \ Q] 

We write Hma strong satisfaction in MA. We only have a partial charac- 
terisation for the capability formulas for Ambients: 

Theorem 5 (Capabilities, strong case). For any A,n, there exist formulas 
amb^(n, A) and open^(n). A such that, for any P: 

- P amb^(n, A) iff there is P' public s.t. P = n[P'] with P' A; 

- P Hma open®(n). a iff there is P' s.t. P = open n. P' with P' Ama 
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The limitation about P' being public when deriving the ambient capability 
is related to the ability for restrictions to cross ambients in the structural rule 
(1) above. We believe that, on the other hand, the capability formulas for in and 
out are not derivable in C. Theorem 5 allows us to obtain characteristic formulas 
for the finite processes of the dialect of Ambients studied in [1], which does not 
have the in and out capabilities. 

We can derive the missing capability formulas for in n and outn, and remove 
the constraints on the ambient capability formula from Theorem 5, in a variant 
of C enriched with the operator of ambient adjunct, @, whose satisfaction rule 
is: 



PhMA-4@n if n[P]hMAA 

The logics for ambients in the literature [5,6] include both an ambient conc- 
sutruct and an ambient adjunct as primitive operators. As a consequence of 
this result, the former operator can be removed, at least in the strong case. 
(In the weak case. Theorem 5 remains valid, but we do not know whether the 
construction that eliminates the ambient formula can be adapted). 

It is worth pointing out that in C (that is, without the ambient adjunct) we 
can derive the formulas for the modalities corresponding to the capability formu- 
las of Theorem 5, without constraints on processes being public. For instance, the 
‘ambient modality’ is a formula (amb^(n)). A such that P [=ma (amb^(n)). A 
if P = (urn) (n[Pj I Q) with n ^ fh and (urn) (P | Q) Hma weak 

modality is similar). 

For lack of space we do not present the formal statement and the proofs of 
these results (see [8]). 



5 Conclusion 

We have showed that with a minimal spatial logic, C, that has no calculus- 
specific operators, we can derive the formulas for capabilities and modalities in 
the TT-calculus, both for the strong and for the weak semantics. Remarkably, the 
logic £ does not tell anything about the features of the underlying calculus, other 
than saying that processes can be put in parallel and names can be restricted. 
To test the robustness of our techniques we have also considered the calculi Att 
and Ambients. As Ambients show, sometimes not all the capability and modality 
operators are derivable from C: the addition of some calculus-specific constructs 
may be needed for this. Still, our constructions may allow us to reduce the 
number of such operators in the grammar of the logic. 

The derivability of capability formulas is also useful for the definition of 
characteristic formulas w.r.t. logical equivalence, along the lines of [7]. We have 
not considered communication in Mobile Ambients, but reasoning as in [11] 
would allow us to handle this extension. Similarly, the 7r-calculus syntax (in the 
synchronous case) sometimes includes an operator of choice; we believe that we 
could adapt our constructions to this extension. 
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We do not know whether our results can be useful in tools (for model check- 
ing, for instance); perhaps our constructions are too complex for this at present. 
However, since they allow us to reduce the number of operators, our results 
should be important in the study of metatheoretical properties of the logics. 

Our work shows that a logic with spatial constructions and the parallel com- 
position adjunct (c>) can express modalities. Conversely, it would be interesting 
to see whether the adjunct is derivable when the capability or/ and the modality 
formulas are primitive in the logic. 
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Abstract. We study an optimization problem that arises in the context 
of data placement in a multimedia storage system. We are given a collec- 
tion of M multimedia objects (data items) that need to be assigned to a 
storage system consisting of N disks di, d 2 ..., djv. We are also given sets 
U\, U 2 , ..., Um such that Ui is the set of clients seeking the ith data item. 
Data item i has size Si. Each disk dj is characterized by two parameters, 
namely, its storage capacity Cj which indicates the maximum total size 
of data items that may be assigned to it, and a load capacity Lj which 
indicates the maximum number of clients that it can serve. The goal is 
to find a placement of data items to disks and an assignment of clients 
to disks so as to maximize the total number of clients served, subject to 
the capacity constraints of the storage system. 

We study this data placement problem for homogeneous storage sys- 
tems where all the disks are identical. We assume that all disks have a 
storage capacity of k and a load capacity of L. Previous work on this 
problem has assumed that all data items have unit size, in other words 
Si = 1 for all i. Even for this case, the problem is AP-hard. For the case 
where Si G {!,..., A} for some constant A, we develop a polynomial 
time approximation scheme (PTAS). This result is obtained by develop- 
ing two algorithms, one that works for constant k and one that works 
for arbitrary k. The algorithm for arbitrary k guarantees that a solu- 



tion where at least 



k-A 

k+A 



1 - 




fraction of all clients are 



assigned to a disk. In addition we develop an algorithm for which we can 
prove tight bounds when Si G {1,2}. In particular, we can show that a 
(1 — fraction of all clients can be assigned, regardless of the 

input distribution. 



1 Introduction 

We study a data placement problem that arises in the context of multimedia 
storage systems. In this problem, we are given a collection of M multimedia 
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objects (data items) that need to be assigned to a storage system consisting of 
N disks di,d, 2 ---, d^. We are also given sets C/i, C/ 2 , Um such that Ui is the set 
of clients seeking the zth data item. Each data item has size Sj. Each disk dj is 
characterized by two parameters, namely, its storage capacity Cj which indicates 
the maximum storage capacity for data items that may be placed on it, and its 
load capacity Lj which indicates the maximum number of clients that it can 
serve. The goal is to find a placement of data items to disks and an assignment 
of clients to disks so as to maximize the total number of clients served, subject 
to the capacity constraints of the storage system. 

The data placement problem described above arises naturally in the context 
of storage systems for multimedia objects where one seeks to find a placement 
of the data items such as movies on a system of disks. The main difference 
between this type of data access problem and traditional data access problems 
are that in this situation, once assigned, the clients will receive multimedia data 
continuously and will not be queued. Hence we would like to maximize the 
number of clients that can be assigned/ admitted to the system. We study this 
data placement problem for uniform storage systems, or a set of identical disks 
where Cj = k and Lj = L for all disks dj. 

In the remainder of this paper, we assume without loss of generality that 
(i) the total number of clients does not exceed the total load capacity, i.e., 

\ Ui\ < N ■ L, and (ii) the total size of data items does not exceed the total 
storage capacity, i.e., Si < N-k and (iii) If Mp is the number of data items 
of size p then Mp < N[^\, since at most items of size p can be stored on a 
single disk. 

In [5,10] this problem is studied with the assumption that all data items 
have unit size, namely Sj = 1 for all data items, and even this case is NP- 
hard for homogeneous disk systems [5]. In this work, we generalize this problem 
to the case where we can have non-uniform sized data items. For the previous 
algorithms [5,10] the assumption that all items have the same size is crucial. 

For arbitrary k and when Si G {1,2} (this corresponds to the situation when 
we have two kinds of movies - standard and large), we develop a generalization 
of the sliding-window algorithm [10], called SW-Alg, using multiple lists, that 
has the following property. For any input distribution that satisfies the size re- 
quirements mentioned above, we can show that the algorithm guarantees that 
at least (1 / ) fraction of the clients can be assigned to a disk. Note 



that (1 — approaches 1 as k increases, and is at least |. This bound 



holds for k > 2. While this bound is trivial when k is even, the proof is quite 
complicated for odd k. In addition, we show that this hound is tight. In other 
words there are instances where no placement of data items can guarantee a 
better bound as a function of k. In fact, this suggests that when Si G A} 

we should get a bound of (1 — argue that this is the best 

bound possible, by providing a family of instances for which no more than a 
(1 , ^ ) fraction of items can be packed. We have verified this conjec- 



ture when Z\ = 2, by providing an algorithm that achieves this bound. Several 
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different approaches we tried, did yield bounds that are non-trivial, however none 
of them are tight. This is because the analysis needs a very careful accounting 
of the wasted space on the disks. By creating a separate list of the items of size 
1 , we are able to avoid situations where we end up wasting space on a disk when 
k is odd, because we are only left with size 2 items. 

For the more general problem when Sj G {1, . . . , A}, we develop a method 
(SW-Alg2) that works with a single list of all the items, sorted in non-decreasing 
density order. This algorithm has the property that at least 



f{k,A) 



_ k-A 
k+A 




fraction of all clients are assigned. When Si G 



{!,..., Z\} for some constant A, we develop a polynomial time approximation 
scheme (PTAS) as follows. For a given e > 0, if (1 — e) < f{k, A) then we can use 
SW-Alg2 to get the desired result. If (1 — e) > f{k, A), then A: is a fixed constant 
(as a function of e and A) and we can use an algorithm whose running time 
is polynomial for fixed k. In fact, this algorithm works when Sj G {oi, . . . ,Oc} 
for any fixed constant c. This generalizes the algorithm presented in [5], which 
is for the case when all Sj = 1. While the high level approach is the same, the 
algorithm is significantly more complex in dealing with lightly loaded disks. For 
any fixed integer k, A and e > 0 this algorithm runs in polynomial time and 
outputs a solution where at least (1 — e) fraction of the clients are assigned. 

At this point, it is worth noting that while there is a PTAS for the problem for 
a constant number of distinct sizes (Section 6 of this paper, and the independent 
work in [8]), even for the simplest case when the data items have unit sizes (for 
example the first PTAS in [5]), none of the approximation schemes are actually 
practical since the running times are too high, albeit polynomial for a fixed e. 
The only known algorithms that are practical, are the ones based on the sliding 
window approach. Hence even though the bounds that one can derive using 
sliding window based methods can be improved by other approaches, this still 
remains the best approach to tackling the problem from a practical standpoint. 
Obtaining a practical PTAS remains open. 



1.1 Related Work 

The data placement problem described above bears some resemblance to the 
classical multi-dimensional knapsack problem [7,2]. However, in our problem, 
the storage dimension of a disk behaves in a non-aggregating manner in that 
assigning additional clients corresponding to a data item that is already present 
on the disk does not increase the load along the storage dimension. It is this 
distinguishing aspect of our problem that makes it difficult to apply known 
techniques for multi-dimensional packing problems. 

Shachnai and Tamir [10] studied the above data placement problem for unit 
sized data items when all Si = 1; they refer to it as the class constrained multiple 
knapsack problem. The authors gave an elegant algorithm, called the sliding 
window algorithm, and showed that this algorithm packs all items whenever 
Cj > M N — 1. An easy corollary of this result is that one can always 
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pack a (1 — Y^)-fraction of all items. The authors [10] showed that the problem 
is NP-hard when each disk has an arbitrary load capacity, and unit storage. 
Golubchik et. al. [5] establish a tight upper and lower bound on the number of 
items that can always be packed for any input instance to homogeneous storage 
systems, regardless of the distribution of requests for data items. It is always 
possible to pack a (1 — )-fraction of items for any instance of identical 

disks. Moreover, there exists a family of instances for which it is infeasible to 
pack any larger fraction of items. The problem with identical disks is shown to 
be NP-hard for any fixed k>2 [5]. 

In addition, packing problems with color constraints are studied in [4,9]. 
Here items have sizes and colors, and items have to be packed in bins, with the 
objective of minimizing the number of bins used. In addition, each item has a 
color and there is a constraint on the number of items of distinct colors in a bin. 
For a constant total number of colors, the authors develop a polynomial time 
approximation scheme. In our application, this translates to a constant number 
of data items (M), and is too restrictive an assumption. 

Independently, Shachnai and Tamir [8] have recently announced a result sim- 
ilar to the one presented in Section 6. For any fixed e and for a constant number 
of sizes Si G {oi, . . . , Oc} and for identical parallel disks they develop a polyno- 
mial time approximation scheme where the running time is polynomial in N and 
M, the number of disks and data items. Since this does not assume constant k, 
they do not need a separate algorithm when k is large. However, the algorithms 
and the ideas in their work are based on a very different approach as compared 
to the ones taken in this paper. 



1.2 Other Issues 

Once a placement of items on the disks has been obtained, the problem of assign- 
ing clients to disks can be solved optimally by solving a network flow instance. 
Our algorithm computes a data placement and an assignment, however it is pos- 
sible that a better assignment can be obtained for the same placement by solving 
the appropriate flow problem. (For the unit size case this is not an issue since 
we can show that the assignment is optimal for the placement that is produced 
by the sliding window algorithm.) 

Another important issue concerns the input size of the problem. The input 
parameters are N, the number of disks, and M(< Nk) the total number of 
movies. Since only the cardinalities of the sets Ui are required, we assume each 
of these can be specified in 0(log \Ui\) bits. In other words, our algorithms run 
in time polynomial in these parameters and are not affected by exceptionally 
large sets Ui, assuming we can manipulate these values in constant time. 



1.3 Main Results 

When data items have size G {1, 2}, we develop a generalization of the Sliding 
Window Algorithm (SW-Alg) using multiple lists, and prove that it guarantees 




Algorithms for Non-uniform Size Data Placement on Parallel Disks 



269 



that at least (1 — fraction of clients will be assigned to a disk. Note 

that this function is always at least | and approaches 1 as fc goes to oo. Moreover, 
we can show that this bound is tight. In other words there are client distributions 
for which no layout would give a better bound. Developing tight bounds for this 
problem turn out to be quite tricky, and much more complex than the case where 
all items have unit size. This already allows for understanding the fragmentation 
effects due to imbalanced load as well as due to non-uniform item sizes. We were 
able to develop several generalizations of the sliding window method, but it is 
hard to prove tight bounds on their behavior. 

In addition, we develop a algorithm (SW-Alg2) for which we can prove that 



it guarantees that at least f{k, A) 



k-A 

k+A 



1 - 



(l + '\/ 2A ) 



fraction of clients 



will be assigned to a disk, when Si G {1, . . . , Z\}. 

As mentioned earlier, by combining SW-Alg2 with an algorithm that runs 
in polynomial time for fixed k we can obtain a polynomial time approximation 
scheme. We develop an algorithm (Apx-Scheme) that takes as input parameter 
two constants k and e' and yields a (1 — e')^ approximation to the optimal 
solution, in time that is polynomial for fixed k and e'. Pick e' so that (1 — e')^ > 
(1 — e) and e' < ^ (we need this for technical reasons). In fact we can set 
e' = min(^,I — (1 — e)3). Use Apx-Scheme with parameters e' and k, both 
of which are constant for fixed e. This gives a polynomial time approximation 
scheme. 



2 Sliding Window Algorithm 

For completeness we describe the algorithm [10] that applies to the case of iden- 
tical disks with unit size items. 

We keep the data items in a sorted list in non-decreasing order of the number 
of clients requiring that data item, denoted by R. The list, A[l], . . . ,R[m], 1 < 
m < M, is updated during the algorithm. At step j, we assign items to disk 
dj. For the sake of notation simplification, R[i] always refers to the number of 
currently unassigned clients for a particular data item (i.e., we do not explicitly 
indicate the current step j of the algorithm in this notation). We assign data 
items and remove from R the items whose clients are packed completely, and 
we move the partially packed clients to their updated places according to the 
remaining number of unassigned clients for that data item. 

The assignment of data items to disk dj has the general rule that we want 
to select the first consecutive sequence of k or less data items, i?[u], . . . , i?[u], 
whose total number of clients is at least the load capacity L. We then assign 
items i?[u], . . . , i?[w] to dj. In order to not exceed the load capacity, we will 
break the clients corresponding to the last data item into two groups (this will 
be referred to as splitting an item). One group will be assigned to dj and the 
other group is re-inserted into the list R. It could happen that no such sequence 
of items is available, i.e., all data items have relatively few clients. In this case. 
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we greedily select the data items with the largest number of clients to fill dj . The 
selection procedure is as follows: we first examine which is the data item 
with the smallest number of clients. If these clients exceed the load capacity, we 
will assign i?[l] to the first disk and re-locate the remaining piece of i?[l] (which 
for i?[l] will always be the beginning of the list). If not, we examine the total 
demand of i?[l] and R[2], and so on until either we find a sequence of items 
with a sufficiently large number of clients (> L), or the first k items have a 
total number of clients < L. In the latter case, we go on to examine the next k 
data items i?[2], . . . , R[k + 1] and so on, until either we find k items with a total 
number of items at least L or we are at the end of the list, in which case we 
simply select the last sequence of k items which have the greatest total number 
of clients. 

The proof of the tight bound in [5] involves obtaining an upper bound on 
the number of data items that were not packed in any disk, and upper-bounding 
the number of clients for each such data item. By using this approach we cannot 
obtain a tight bound for the case when the data items may have differing sizes. 
One problem with such an algorithm is that it may pack several size 1 items 
together, leaving out size 2 items for later, and when K is odd, we may waste 
space on a disk simply because we are left with only size 2 items and cannot 
pack them perfectly. 



3 Multi-list Sliding Window Algorithm 

Let Ml be the number of size-1 items and M 2 be the number of size-2 items. 
At any stage, let mi and m 2 be the number of size-1 and size-2 items on the 
remaining items list (the list of items whose clients have not been assigned 
completely). Here we only discuss the case when k is odd, since there is a simple 
reduction of the case when k is even to the unit size case (as will be shown later). 

The algorithm constructs and maintains three lists Li, L 2 and aux-list. If 
Ml < N, then note that there are at least N — Mi units of unused space in the 
input instance. In which case, the algorithm adds N — Mi dummy size-1 items 
with zero load. The algorithm then sorts the size-1 items and the size-2 items in 
non-decreasing order of demand in lists Li and L 2 respectively. The top N size- 
1 items with the highest demand are moved into aux-list. The remaining size-1 
items are kept in Li. All the size-2 items are placed in the L 2 list. From this 
stage on, the algorithm maintains the Li, L 2 and aux-list lists in non-decreasing 
order of demand. 

For each disk (stage), the algorithm must make a selection of items from Li, 
L 2 and aux-list. Assume the lists are numbered starting from 1. Exactly one 
item for the selection is always chosen from aux-list. The algorithm then selects 
Wi consecutive items from Li and W 2 consecutive items from L 2 such that the 
total utilized space of the selected items from Li and L 2 is < k — 1 (<fc— lif 
we have an insufficient number of items, or the items have a very high density). 

Define the wasted space of a selection to be the sum of the unused space 
and the size of the item that must be split to make the selection load-feasible. 




Algorithms for Non-uniform Size Data Placement on Parallel Disks 



271 



At each stage the algorithm makes a list of selections (5) by combining the 
following selections (one from L 2 , one from Li and one from aux-list). It selects 
IV 2 , 0 < IV 2 < min([|J,m 2 ) consecutive size-2 items from L 2 at each of the 
positions 1 . . . (m 2 — tU 2 + !)• It selects wi, 0 < < min(fc — 2 w 2 — 1, m^) size-1 

items from Li at each of the positions 1 . . . {nii — tci -I- 1). It selects a size-1 item 
from aux-list at each of the positions 1 . . . \aux-list\. 

If Vs G S, load{s) < L the algorithm outputs the selection with highest load. 
If 3s G 5 where load{s) > L, then let V be the set of all the selections in S with 
load > L. Let V Q V he the set of all the selections which can be made load- 
feasible by allowing the split of either the highest size-2 item in the selection, or 
the highest size-1 item from L\ in the selection, or the size-1 item from aux-list 
in the selection. 

The algorithm chooses the d G V with minimum wasted space. The algo- 
rithm outputs d = {di, . . . , dj} where di = d^ -\- d^ , load{di, . . . ,di) > L and 
load{di , . . . , dj) = L. In the step above, the algorithm is said to split d^. If dj > 0 
the algorithm then reinserts dj (the broken off piece) into the appropriate posi- 
tion in the list from which di was chosen. If the broken off piece was reinserted 
into aux-list, the algorithm shrinks the length of aux-list by one. The size-1 item 
that leaves aux-list in the previous step is then reinserted into the appropriate 
position of the Li list. If the broken off piece was reinserted into some other list 
(other than aux-list) then note that the size of aux-list reduces by one anyway 
since the item from aux-list is used up completely. 



4 Analysis of the Algorithm 

For each disk in the system, the solution consists of an assignment of data items 
along with an assignment of the demand (i.e., the clients for this item that are 
assigned to the disk) for each of the items assigned to the disk. We will argue 
that the ratio of packed demand (S) to total demand is at least (1 — ^-j-p-^ 7 ==j^). 

This bound is trivial to obtain for even k as shown next. Most of this section 
will focus on the case when k is odd. 

Further, this bound is tight - i.e. there are instances for which no solution can 
pack more than a (1 — fraction of the total demand. The example 

to show that the bound of (1 . ^ ) on the fraction of packed demand 

is tight is omitted due to space constraints and is available in the full version of 
the paper [13]. 



4.1 Even K 

Given an instance / create a new instance I by merging pairs of size-1 items to 
form size-2 items. If M\ (the number of size-1 items in I) is odd, then we create 
a size-2 item with the extra (dummy) size-1 item. Size-2 items in / remain size-2 
items in I . Note that since k is even, I will remain feasible although M\ may 
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be odd. We now scale the sizes of the items in / by 1/2 and apply the sliding 
window algorithm described in Section 2. The basic idea is to view a capacity k 
disk as a capacity k/2 since each item has size 2. From the result of [5], we get 



the desired bound of > (1 — 



(i+yjfeM)" 



)• 



It is easy to use the above approach to obtain a bound of (1 — ^)(1 — 



(i+yife7^)2 



) when k is odd. However, this bound is not tight. 



4.2 Odd K 

The algorithm produces a set of load saturated disks at first, where the total load 
is exactly L. The number of such disks will be referred to as Ni. The number of 
disks with load less than L will be Ng (non load saturated disks). We will assume 
that the minimum load on a non load saturated disk is cL (in other words define 
c appropriately, so that each non load saturated disk has load at least cL). We 
will refer to us{i) as the utilized space on disk di. This is the total amount of 
occupied space on a disk. 

We will first bound the space wasted in packing the load-saturated disks and 

then bound the space wasted in packing the non load-saturated disks to show 

that > (1 / ). 

s+u-y 

The algorithm works in stages, producing one window per stage which cor- 
responds to the assignment for a single disk. We know that, at any stage, if we 
have at least one load saturated window, then the algorithm selects the window 
with load > L that is: 

— Load-feasible with one split (i.e. the load of the window becomes = L by 

splitting at most one item) and 

— Minimizes wasted space 

Li is the list of (Mi — N) size-1 items, L 2 is the list of size-2 items, and 
aux-list is the list of N size-1 items with highest load. 

If at any stage, both the Li and L 2 lists are empty while there are some items 
remaining in the aux-list, since the number of items in the aux-list is equal to the 
number of unpacked disks, they will be packed completely (this actually follows 
from [10], see [5] for a simpler proof). Furthermore it is not hard to show that if 
at any stage j, we have produced j — 1 load-saturated disks and the total size of 
the objects in the Li and L 2 lists is < A: — 1, then all the items will be packed 
at the termination of the algorithm. 

Lemma 1. When the current window has us{i) = k — 1 and a size H item is 
split, then every leftmost window in the future of size k — 2 (not including the 
split piece) has load > L, 

This argues that the split piece of size 2 along with a chosen window of size 
k — 2 will produce a load saturated disk. If again we split off a piece of size 2, 
then repeatedly we will continue to output load saturated windows, until we run 
out of items. 
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Lemma 2. When the current window has us{i) < k — 2 and an item is split, 
then every leftmost window of the same size as the current window must have 
load > L 

We next show that for each load saturated disk we have at most two units 
of wasted space. 

Lemma 3. If at the termination of the algorithm there are unassigned clients 
then for every load- saturated disk di one of the following conditions must hold: 

1. Disk di has us{i) > k — 1 and a size-1 item is split, or 

2. Disk di has us{i) = k and a size-2 item is split. 

Lemma 4. If at the termination of the algorithm there are unassigned clients 
then either 

1. All the non load-saturated disks are size-saturated. 

2. Only size-2 items are remaining and there is at most one non load- saturated 
disk with exactly one unit of unused space and all the other non load- saturated 
disks are size- saturated. 



Theorem 1. It is always possible to pack a (1 
for any instance. 



1 

A+VWw 



) -fraction of items 



5 Generalized Sliding Window Algorithm 



The sizes of the items in our instance are chosen from the set Z\}. 

In this section, we present an algorithm that guarantees to pack a 



k-A 

k+A 




fraction of clients for any valid problem instance. 



The algorithm works in two phases. In the first phase it produces a solution 
for a set of N disks each with storage capacity k -\- A and load capacity L. In 
the second phase, the algorithm makes the solution feasible by dropping items 
from these disks until the subset of items on each disk has size at most k and 
load at most L, respectively. 

In the first phase of the algorithm, the algorithm keeps the items in a list 
sorted in non-decreasing order of density pi, where Pi = ^, h and s* are the 
load and size of item i. At any stage of the algorithm, this list will be referred 
to as the list of remaining items. 

For each disk, the algorithm attempts to find the first (from left to right in 
the sorted list) “minimal” consecutive set of items from the remaining items list 
such that the load of this set is at least L and the total size of the items in the 
set is at most k -\- A. We call such a consecutive set of items a “minimal” load- 
saturating set. The set is “minimal” because removing the item with highest 
density (i.e., the rightmost item) from this set will cause the load of the set to 
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become less than L. Say the items in such a “minimal” set are some Xu, ■ ■ ■ , Xy. 
We have J2i=u Si < k + A and u is the first index 

where such a load-saturating set can be found. If a “minimal” load-saturating 
set is found, then the algorithm breaks the highest density item in this set (i.e., 
Xy) into two pieces x^> and x^" such that lyi + ~ piece Xj^" is 

reinserted into the appropriate position on the remaining items list. 

If the algorithm is unable to find such a “minimal” load-saturating set, then 
it outputs the last (from left to right) “maximal” consecutive set of the highest 
density items from the remaining items list. We call such a set a “maximal” 
non load-saturating set. Say the items in this “maximal” set are some Xp, . . . ,Xq 
(where Xq is the last item on the list of remaining items at this stage). The set is 
“maximal” in the sense that Sp_i -I- Si > k + A {if Xp is not the first item 
in the list of remaining items) and Si < k + A. Since we know that the set 

was not a load-saturating set we have X)i=p 

The algorithm outputs these sets as follows. Let the items on the remaining 
items list be xi,...,Xq. For each disk, add item xi to the current selection. 
Repeat the following steps until we find either a “minimal” load-saturating set 
or a “maximal” non load-saturating set: Say the next item, that is the item on 
the remaining items list after the last item in current selection, at any stage 
is Xi- If loo(i(current selection) < L and Si + size(current selection) < k + A, 
then add Xi to current selection. Else if ?oa(i(current selection) < L and Si + 
size(current selection) > k + A, drop the lowest density items from current 
selection as long as Si + szze(current selection) > k + A, and then add Xi to 
current selection. Note that if load(current selection) > L or = 0, then we 
have found either a “minimal” load-saturating set or a “maximal” non load- 
saturating set. If the algorithm finds a “minimal” load-saturating set then it 
breaks off the highest density item in current selection (as described above), 
reinserts the broken-off piece into the appropriate position on the remaining 
items list and outputs the modified current selection. If the algorithm finds just 
a “maximal” non load-saturating set, it simply outputs the current selection. 
After the algorithm outputs a selection, these items are removed from the list 
of remaining items. At the end of the first phase of the algorithm, each disk is 
assigned either a “minimal” load-saturating set of items or a “maximal” non 
load-saturating set of items. 

In the second phase, for each disk, the algorithm drops the lowest density 
items assigned to the disk until the size of the packing is at most k. Since the load 
of the packing was feasible to begin with, at the end of this phase the algorithm 
produces a feasible solution. 



Theorem 2. It is always possible to pack a 
clients for any valid input instance. 




fraction of 



Lemma 5. If us(i) < k — A for any load-saturated disk i at the end of phase I 
of the algorithm, then all items are packed at the end of phase I of the algorithm. 
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Lemma 6. At the end of phase I of the algorithm, at least o 1 — 

fraction of clients are always packed. 

Let S be the total load of items packed at the end of phase II and let S be 
the total load of items packed at the end of phase I. 

Lemma 7. At the end of phase II of the algorithm, ^ 

Using these two lemmas, we easily obtain the proof of Theorem 2. 






6 Polynomial Time Approximation Schemes 



Note that 



f{k,A) > 



k- A 
k + A 




k-A 
k + A 



1 



k-A 

2A 



Thus algorithm SW-Alg2 can definitely pack a fraction of items 

for any valid problem instance. Also note that tends to 1 as 

k — >■ 00 . 

If 1 — e < ^I — then we can use Algorithm SW-Alg2 and get a 

solution within the desired error bounds. If 1 — e > then k is & 

constant {k < and we develop a PTAS for this case. This scheme is a 

generalization of the scheme developed in [5]. Algorithm Apx-Scheme takes as 
input parameters k, c and e' and produces a solution that has an approximation 
factor of (1 — e')^, in time that is polynomial for fixed e' > 0 and integers k,c. 
The sizes of the items are in the set {ai, . . . ,Oc} with Oi > 1. (If the sizes are 
chosen from A} for some constant A, then this is easily seen to be the 

case.) To get a (1 — e) approximation, we simply define e' = 1 — (1 — e) 3 . 

For technical reasons we will also need to assume that e' < If this is 
not the case, we simply lower the value of e' to ^ Since fc is a fixed constant, 
lowering the value of e' only yields a better solution, and the running time is 
still polynomial. 

The approximation scheme involves the following basic steps: 



1. Any given input instance can be approximated by another instance I' such 
that no data item in I' has an extremely high demand. 

2. For any input instance there exists a near-optimal solution that satisfies 
certain structural properties concerning how clients are assigned to disks. 

3. Finally, we give an algorithm that in polynomial time finds the near-optimal 
solution referred to in step (2) above, provided the input instance is as 
determined by step (1) above. 



Details of the approximation scheme are omitted due to space constraints 
and are available in the full version of this paper [13]. 
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Abstract. We consider the problem of determining if two finite groups 
are isomorphic. The groups are assumed to be represented by their multi- 
plication tables. We present an algorithm that determines if two Abelian 
groups with n elements each are isomorphic. The running time of this al- 
gorithm is 0{n logp), where p is the smallest prime not dividing n. When 
n is odd, this algorithm runs in linear time; in general, it takes time at 
most 0(n log log n), improving upon an algorithm with 0(n log n) run- 
ning time due to Vikas [13]. Our Abelian group isomorphism algorithm 
is a byproduct of an algorithm that computes the orders of all elements 
in any group (not necessarily Abelian) of size n in time 0(n logp), where 
p is the smallest prime not dividing n. We also give an 0{n) algorithm 
for determining if a group of size n, described by its multiplication table, 
is Abelian. 



1 Introduction 

The group isomorphism problem is to determine whether two finite groups are 
isomorphic. Two groups G and H are said to be isomorphic, written as G ~ H, 
if there is a function f : G ^ H which is one-to-one and onto such that for 
all a, 6 G G,f{ab) = f{a)f{b). Two isomorphic groups are essentially the same, 
with elements renamed. In this paper, we consider the problem of determining if 
two Abelian groups are isomorphic. We assume that the groups are represented 
by their multiplication tables. 

1.1 Background 

The group isomorphism problem is one of the fundamental problems in groups 
and computation [2,6,9]. In contrast to graph isomorphism, very little is known 
about the complexity of group isomorphism testing. The group isomorphism 
problem can be reduced to that of directed graph isomorphism [10], so deter- 
mining group isomorphism is not harder than graph isomorphism. Hence, the 
group isomorphism problem is not likely to be NP-hard. Tarjan [9] has shown 
that this problem can be done in time for groups of size n. Lip- 

ton, Snyder and Zalcstein [6], independently of Tarjan, showed a stronger result 
that group isomorphism can be solved in space 0(log^ n). They also proved that 
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the isomorphism of two finite Abelian groups can be tested in polynomial time. 
Garzon and Zalcstein [2] showed that the isomorphism problem for a class of fi- 
nite groups containing the class of finite Abelian groups is in P. But the running 
times of these algorithms are not explicitly given. Savage [12] claimed an algo- 
rithm for showing that if the groups are Abelian, isomorphism can be determined 
in time O(n^). Vikas [13] improved this bound and gave an 0(n) algorithm for 
Abelian p-group isomorphism and an O(nlogn) algorithm for Abelian group 
isomorphism. (G is said to be a p-group, if the size of G is p™, where p is a 
prime number and m > 0 is an integer.) 

A related problem is the following: given a group G in the form of a multipli- 
cation table, determine if G is Abelian or non- Abelian. Variants of this problem 
have been studied in the setting of a binary operation acting on a set. Given an 
nxn table of a binary operation o on a set of size n, it is easy to see that the com- 
plexity of testing whether o is commutative is G(n^) by hiding a non-commuting 
pair a and b. Ravi Kumar and Rubinfeld [5] show that this lower bound holds 
even when o is known to be cancellative. Pak [11] considered a related problem 
of testing the commutativity of a group G when the set of generators of G is 
given. 



1.2 Our Results 

Our main result is the following theorem. 

Theorem 1. Group isomorphism for Abelian groups of size n can he determined 
in time O(nlogp), where p is the smallest prime non-divisor ofn. 

Using the Weak Prime Number Theorem, p can be bounded by O(logn). 
For the sake of completeness, we include a proof of this in Section 3.4. So, the 
worst case running time of our algorithm is 0(n log log n) and in particular, our 
algorithm runs in time 0(n) for all odd n and for any n which has a small prime 
not dividing it. 

It is well-known (see Fact 1, Section 2) that two Abelian groups are isomor- 
phic if and only if the number of elements of order m in each group is the same, 
VI < m < n. Since the order of any element is an integer between 1 and n, if the 
orders of all elements are known, the number of elements of order m'il < m < n 
can be determined in 0{n) time with random access. Hence, our improved algo- 
rithm for isomorphism testing in Abelian groups follows immediately from the 
following theorem. 

Theorem 2 (Order finding). Given any group G of order n, we can compute 
the orders of all elements in G in 0(n log p) time, where p is the smallest prime 
non-divisor ofn. 

Note that this theorem works for all groups, not just Abelian groups. How- 
ever, the reduction of the isomorphism problem to order finding is not valid for 
general groups (see Section 2 for a counter-example) . 
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Our O(nlogp) algorithm for order finding generalizes the idea used for p- 
groups by Vikas in [13] to general groups. Vikas in [13] showed that the orders 
of all elements of a p-group can be computed in 0{n) time. The O(nlogn) 
isomorphism algorithm in [13] for Abelian group isomorphism does not compute 
the orders of all elements. The only algorithm given in [13] for computing the 
orders of elements in a general group is the naive O(n^) algorithm, which takes 
every element of the group and computes its successive powers till identity is 
seen. 

For determining if a given group is commutative, we have the following result. 

Theorem 3. Testing the commutativity of a group given in the form of a mul- 
tiplication table can he done in time 0{n). 

Organisation of the paper: In Section 2 we review some preliminaries 
of group theory and define our model of computation. In Section 3 we describe 
how to find orders of all elements in a group in 0(n log p) time. In Section 4 
we describe an 0{n) algorithm which determines whether a group is Abelian or 
non-Abelian. 

2 Preliminaries 

2.1 Group Theory 

A quick review of some definitions and facts from Group Theory [3,4,7]. 

1. Let |G| denote the cardinality of a group G and the order of G is |G|. For 
a G G, the order of a, denoted as ord(a), is the smallest integer i > 1 such 
that a* = e, where e is the identity in G. If G is finite, then ord(a) divides 
|G|. 

2. Let a be an element of a group G such that ord(a) = k. The set (a) = 

{a, a^, a^, ..., = e} is a group (subgroup of G) called the cyclic group 

generated by a. One can easily prove that 

ord(a) = ord(a*) • gcd(ord(a), z) Vi = 1, 2, ..., fc, (1) 

where gcd is the greatest common divisor operator. Two finite cyclic groups 
(a) and (6) are isomorphic iff ord(a) = ord(&). Let Zk denote a cyclic group 
of order k. 

3. The direct product of two groups Gi and G2 is the group Gi x G2 = 
{(a, 6)1 a G Gi, 6 G G2} whose multiplication is defined by (oi, 61) * (02, 62) = 
(0102,6162). The product 0102 is computed in Gi and the product 6162 is 
computed in G2. The order of (o, 6) is the least common multiple of ord(o) 
and ord(6). 

4. Any Abelian p-group of order p™, m > 1 can be represented as 

G ~ Zpri X Zpr2 X ... X Zprk , (2) 

where = m,ri > r^+i >0 ,l<i<fc — 1. 

We will call the sequence p’’Lp’'^, ...p’''“ the elementary divisor sequence or 
EDS of G. The EDS of the trivial group is empty. 
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5. Any finite Abelian group G of order n > 1, can be represented as 

G«G(p“^)xG(p“^)x... xG(p:^), (3) 

where 1 < pi < P2 < < Ps, Pi a prime number, and ai > 0 an integer 

Vi = 1, 2, s such that n = p\^ P 2 ^ . . .p‘l‘ and G{p'^') is an Abelian p^-group 
of order . 

Note that each G{p^^) can be further expanded to a direct product of cyclic 
groups of prime power orders as explained in equation (2). The EDS of G 
will be EDS of G(p“^) concatenated with the EDS of G(p 2 ^) and so on. Two 
finite Abelian groups are isomorphic iff they have the same EDS. 

Fact 1 Two finite Abelian groups G and H with |G| = \H\ = n are isomorphic 
iff the number of elements of order m in each group is the same, VI < m < n. 

Proof, (from [13]) By equation (3), it is enough to prove it for Abelian p-groups. 
The number of elements of order p* , t > 0 in Zpvi x ZpV 2 x ... x Zpr^ , where 
■Cl > T 2 > ... > Tfc, is p*f+'’3+i+’'3+2H (1 _ X /pf ), where j is the value satisfying 

(i) t < ri,\/i = 1,2, ... ,j and 

(ii) t > n,'ii = j + l,j + 2, ...,k. 

From this formula and the conditions (i) and (ii) on j, it is clear that the 
number of elements of order p* in Zpri x Zpr -2 x ... x Zpru is equal to the number of 
elements of order p* in x Z^r'^ x ... x , Vt > 0, iff the sequences ri , r 2 , ...ru 

and r[,r' 2 , ...r'f^ are identical, i.e., iff the EDS’s of the two Abelian p-groups are 
identical, i.e., iff the two Abelian p-groups are isomorphic. □ 

The above fact is not true for general groups. Consider the set H of 3 x 3 upper 
triangular matrices with I’s on the diagonal and the 3 entries above the diagonal 
from the field Z^. H forms a non-Abelian group under matrix multiplication and 
its size is 27. The order of any non-trivial element in H is 3, as shown below. 

/ 1 X y\^ /l 3x 3y + 3xz\ /l 0 0\ 

01z = 0 1 3z = 010 

\0 0 1/ \0 0 1 J \0 0 1 J 

Consider the Abelian group H' = Z^x Z^x Z^. Both H and H' have 26 elements 
of order 3 and 1 element of order 1. But H ^ H' . 

2.2 Model of Computation 

The groups are represented by their multiplication tables. For a group G of size 
n, we define the multiplication table M of G to be a two-dimensional nxn array 
of elements of G, indexed in both dimensions by the elements of G, such that for 
all a,b G G, M{a, b) = ab. This is equivalent to the multiplication oracle where 
each group operation can be performed in constant time. 

Random access in constant time is assumed. We assume that the elements of 
the group are encoded as 1,2, ...,n which we use to index into arrays whenever 
needed. We also assume that arithmetic on O(logn) bits takes constant time. 
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3 Proof of Theorem 2: Order Finding in 0(n log p) Time 

3.1 Overview 

In this section, we present an algorithm to determine the orders of all elements 
of G. The running time of the algorithm is 0(n log p), where p is the smallest 
prime that does not divide n. It is easy to see that p can be computed in time 
0{n) (see Section 3.3), so we will now assume that p is available to us as part 
of the input. 

We use an array ORD[l..n] to store the orders of the elements of G. Initially, 
the identity (call it e) is the only element whose order is known, so ORD[e] = 1. 
(Note that we can easily determine e in 0{n) time examining the first row of the 
table M.) When the algorithm terminates we will have ORD[x] = ord(x) for all 
X G G. We consider the elements of the group one after the other and for each 
element we compute its order if it is not already known. Ideally, we would like to 
perform this computation in time O(logp). Unfortunately, we cannot guarantee 
that each such computation terminates in time O(logp). Instead, we shall ensure 
that if we take time m to compute the order of an element, then we also find the 
orders of at least f7(m/logp) new elements. That is, we will amortize the cost of 
this computation over the computation of the orders of several other elements. 
By using a bit-array to mark elements whose orders are already known, it is then 
easy to devise an algorithm that terminates in time O(nlogp). 



for each x G G whose order is not already known do 

1. Find the smallest power of x, call it x^ , whose order is known. 

2. Compute ord(a;) using ord(a;*^). 

3. Compute the orders of x^ ,x^ , ...,x^~^ using ord(x). 
end for 



Fig. 1. A summary of our order finding algorithm 



Step 1 is implemented by computing until we find an element x^ 

whose order is already known. We have performed k — 1 group operations in 
Step 1. However, in Step 3, which can be implemented using 0{k) arithmetic 
operations, we will find the orders of x'^ , ...,x^~^ . Thus, in Steps 1 and 3, we 
perform only constant work per element. We still need to account for the work 
involved in Step 2. Here, we will ensure that if A: -|- m operations are needed 
to compute ord(a;) from ord(x^), then the orders of at least f7(m/logp) new 
elements are found. It is easy to see then, that Steps 1, 2 and 3, put together, 
take only O(logp) time per element. 

3.2 Implementing Step 2 

We know t = ord(x^). We want to compute ord(a:) using t. A straightforward way 
to do this would be to compute y = x* and then compute ord(y) by finding the 
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least i > 1 such that y* = e. Then, we could set ord(a;) = t-ord{y) (refer equation 
(1)). Computing ord(y) by this naive method involves ord(j/) — 1 multiplications. 
Since ord(j/) < k, this amounts to only constant cost per element whose order 
is found in Step 3 of the current iteration. Thus, we need to worry only about 
computing y = x* given x. 

The standard method for computing x* given x is by repeated squaring. This 

might involve up to C(log t) operations, and there seems to be no better bound on 

t than 0{n). So in the case when t ^ p, this method does not guarantee O(logp) 

average work per element. The main idea in the implementation of Step 2, is that 

we can make crucial use of p. Note that gcd(ord(a;),p) = 1 since gcd(n,p) = 1 

and ord(x) is a factor of n. Hence, ord(a;) = ord(a;P). So, we will compute y = x^ 

not by repeatedly squaring x but by repeatedly computing the p-th powers of 

x: x^,x^ , .... Suppose £ = ord(a;^ ) is known. Then we immediately know that 

£ is also the order of x,x^, ...,xP . Computing x^ ^ from x^ (by repeated 

squaring) takes logp group operations. Thus, if we spend time m to compute x* 

from X, then we compute the orders of at least f2{m/ log p) new elements. More 

2 

precisely we have the following strategy. Compute x^,x^ , ... until one of the two 
cases occurs: 

We encounter an element x^ whose order is known: We have done 

0{i logp) work and we can set ord{x ^^ ) = ord(x^ ), for j = 0, 1, . . . , z— 1. It is 
easy to see that amounts to O(logp) work per element. We have determined 
ord(a;) and can now proceed to Step 3. 

We reach a power p* > t: In this case we have already performed 0{logpt ■ 
logp) = 0(logt + logp) work. We determine y = x^ hy repeated squaring, 
performing an additional O(logt) work, and then determine ord(x) from 
ord(p) using 0{k) work, as described above. We also set ORD[a;^^] = ord(a;) 
for j = 1, . . . , [logp t\ . The total work done in Step 2 is 0{k + logp t ■ logp); 
we find the orders of at least max{fc, [logpfj} new elements in Steps 2 and 
3. Thus, we perform O(logp) work per element. 



3.3 Implementing Step 3 Using 0{k) Arithmetic Operations 

We want to compute ord(x*) from ord(x) using the formula 
ord(a;) = ord(cc*) • gcd(ord(a;), z). 

Unfortunately, this involves computing gcd(ord(a;), z) which takes C(logz) time. 
So the entire computation, if done naively, would require fi(klogk) arithmetic 
operations. The following lemma shows that one can, in fact, compute all the 
required gcds using 0{k) arithmetic operations. Thus, Step 3 can be implemented 
with 0{k) arithmetic operations. 

Lemma 1. (gcd(ord(x), z) : z = 2, . . . , fc — 1) can he computed using 0{k) arith- 
metic operations. 
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In order to prove Lemma 1, we will need linear-time methods for computing 
certain sets of primes and prime powers. We will make some claims about com- 
puting such numbers, and use these claims to prove the lemma. Then, we will 
provide proofs for the claims. 



Claim 1. The set Qn of all primes in {l,2,...,n} (in sorted order) can be 
computed in 0(n) time. 

The set Qn is computed once, at the beginning of the order finding algorithm. 
Note that once is available, we can determine in 0{k) time, the set Qk-i of 
primes in {1, . . . , fc — 1}. Using the set Qk-i, the algorithm in Fig. 2 computes 
the required gcd values and stores them in the the array GCD[1..A: — 1]. This 
algorithm is not efficient enough for our purposes, but it illustrates the main 
ideas. By refining it we will be able to obtain an algorithm that justifies Lemma 1. 

The algorithm starts by setting GCD[z] = i, for 1 < f < fc — 1. Then, it 
considers each prime q G Qk-i and computes the highest power of q less 
than k that divides ord(cc). If for some i, the highest power of q that divides i 
is more than r^, say -I- £, then it divides GGD[z] by q over £ iterations. Thus, 
in the end, for each q, the highest power of q that divides GGD[i] is exactly the 
minimum of the highest power of q that divides i and ord(a;), that is, in the end 
GGD[t] = gcd(ord(a;), i). To compute we compute q^,q^,... until the power 



Set GCD[i] = i for i = 1, 2, . . . , fc — 1. 

for all q G Qk do 

Determine r, which is the highest power of q such that g''"’ < k and q^'‘ divides 
ord(a;). 

power = {All numbers between 1 to A: — 1 which are multiples of power 

should have the extra powers of q removed from their gcd value. We do that now.} 
while power < k do 

GCD[i] = GCD[i]/g for all i which are multiples of power, 
power = power • q; 
end while 
end for 



Fig. 2. Algorithm-GCDl 



exceeds k — 1. Clearly, the different intermediate powers computed at each step 
are different, so the total number of arithmetic operations required to compute 
q^i for all primes q G Qk-i is 0{k). 

In the while loop corresponding to g G Qk-i, this algorithm visits at most 



k k 

gr^ + l grg-l-2 



o( 



k 

qr, + l 



) 
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locations of the array GCD. So, the running time of this algorithm can be 
bounded by a constant times 

E k 

q^g+i ■ 

Q^Qk-l 

Unfortunately, if most are 0, this sum this could be as large as k log log k+0{k) 
[1]. So, we need another idea. 

Proof of Lemma 1. We divide the primes in {1, 2 . . . , A: — 1} into two sets: Good 
and Bad. The primes in Good are factors of ord(x); those in Bad are not. The 
sets Good and Bad are formed in 0{k) time by taking each element q in Qk-i 
and checking if ord(x) mod <7 = 0. 

Every number between 2 and fc — 1 has one of the following forms: 

Good- Powers: product of powers of primes which belong to Good, 

Bad- Powers: product of powers of primes which belong to Bad, 

Mixed: product of a Good-Powers and a Bad-Powers. 

Claim 2. The sets Good-Powers and Bad-Powers (each in sorted order) can be 
computed in 0{k) time. 



Initialize the array GCD[1, . . . , A: — 1] by setting GCD[i] = i for each i G Good-Powers 
and setting GCD[i] = 0 otherwise. 

for all q £ Good do 

Determine r, which is the highest power of q such that g’"'* < k and q^'‘ divides 
ord(a;). 

power = q^‘‘^^\ 
while power < k do 

GCD[i] = GCD[i ]/(7 for all i which are multiples of power 
power = power • g; 
end while 
end for 



Fig. 3. Computing gcd(ord(a;), i) for i £ Good-Powers 



Clearly, the gcds of the elements in Bad-Powers with ord(a;) is 1. Further- 
more, every element in Mixed is expressible uniquely in the form i ■ j, where 
i £ Good-Powers and j £ Bad-Powers, and for such elements, gcd(ord(a;), f j) = 
gcd(ord(a;),i). So, it is enough to compute gcd(ord(a;), f) for i G Good- Powers. 
In Fig. 3, we adapt the algorithm in Fig. 2 for this task. Arguing as above, one 
can easily verify that when this algorithm terminates, GGD[z] = gcd(z, ord(x)) 
for every i £ Good-Powers. The analysis of the running time is also similar. For a 
fixed q £ Good, at most 0( ^rf+i ) locations of GGD are visited in the while loop. 
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By definition of Good, Vg > 1. Hence, the running time of this algorithm can be 
bounded by a constant times 



E 

g^Good 



k 



0{k). 



Thus, in linear time, we have computed GCD[z] for each i G Good-Powers. 
As outlined above, after this, computing GGD[i] for the remaining elements is 
straightforward. For j G Bad- Powers, we set GGD[j] = 1. For Mixed elements, 
we run through elements i G Good- Powers, and for each i we pick elements 
j G Bad-Powers in increasing order (until i ■ j > k) and set GGD[t • j] = GGD[f]. 
This finishes the proof of Lemma 1 . □ 



Proof of Claim 1. We will prove that the set of all primes in {1,2, ...,n} 
can be computed in 0(n) time. Let fV[l..n] be an array such that N[i] = i. We 
want to mark composite numbers in N but if we mark all multiples of 2, followed 
by all multiples of 3, ... then we mark some elements many times and we thus 
spend more than 0{n) time. We now present a variant of an algorithm in [8] to 
compute Qn in time 0(n). We use a doubly linked list L which takes care that 
no element in N gets marked twice. L has n nodes to begin with and the ith 
node in L stores the value i. The ith location in N and the ith node in L are 
connected to each other by pointers. 

1. Initially, all the elements in N are unmarked and the set Qn is empty. 

2. Pick the smallest unmarked element i > 2 in N and add i to Qn- 

3. Starting from the first node, walk up the list L in increasing order until we 
find the smallest number r in it such that r ■ i > n. 

4. Walk down the list L (crucially, in decreasing order) from the node r, and 
for each I in L seen, mark the element N[i ■ 1] and delete i ■ I from L. 

5. If N has any unmarked element > 2, go back to Step 2. 

It can be easily shown that each element in N gets marked just once. By 
charging the cost of traversals of L in every round to the elements that got 
marked in that round, we can see that the algorithm takes 0{n) time. The set 
Qn contains all primes < n (in sorted order). □ 

Proof of Claim 2. Computing the set Good- Powers is equivalent to computing 
the subset of elements of \k — 1] that are not multiples of any element in Bad. 
Similarly, computing the set Bad- Powers is equivalent to computing the subset 
of elements of [k — 1] that are not multiples of any element in Good. So, here we 
describe an 0{k) algorithm which, given a subset S of primes in [k — 1], outputs 
the subset Q of elements of [k — 1] that are not multiples of any element in S. 

The algorithm for this problem is very similar to the algorithm given in the 
proof of the previous claim. We maintain N and L as in the proof of the first 
claim with numbers only till k — 1 now. Initially, all the elements in N are 
unmarked. 




286 



T. Kavitha 



for all p € S do 

Starting from the first node, walk up the list L in increasing order until we 
find the smallest number r in it such that r ■ p > k. 

Walk down the list L (in decreasing order) from the node r, and for each I 
in L seen, mark the element N[p ■ 1] and delete p ■ I from L. 

end for 

After this process is completed, any element in N which has a factor from S 
would have got marked. So, all the unmarked elements in N (except the element 
1) form the required set Q. It is easy to see that this algorithm takes 0{k) time. 

□ 



3.4 A Bound on p 

Here we show a bound on the smallest prime p that does not divide n. 

Lemma 2. The value of the smallest prime non-divisor of n is O(logn). 

Proof. The value of 7t(to), the number of prime numbers < m is given by the 
following bounds for all m > 1 [1]: 

to/ 6 log m < 7t(to) < 8m /log m 

So, the number of prime numbers between logn and 100 log n is greater than 

log n 

log n/log log n. Their product is greater than (log n) = n. Hence, all of 
them cannot be factors of n. So, at least one of them (call that number p) is a 
non-divisor of n. And, p < 100 logn. □ 



4 Proof of Theorem 3: Testing the Commutativity of a 
Group 

Given a group G in the form of a multiplication table, we would like to determine 
if G is Abelian or non-Abelian. It is easy to see that we have a lower bound 
of I7(n) for this problem in this model because a non-Abelian group of order 
n could have its centre (those elements of G that commute with every other 
element in G) as large as n/4.^ We could arrange the elements of G such that a 
deterministic algorithm which reads just o(n) entries and hence, has looked at 
just o(n) elements of G would be unable to distinguish an Abelian group from 
the centre of a non-Abelian group. 

We show an upper bound of 0{n) for this problem. Our algorithm grows a set 
of generators S for G. It maintains the invariant that the subgroup H generated 
by the elements of S is Abelian. H is maintained as a set and we also use a 
bit-array to mark the elements in H . 

^ If G — Z 2 X Z 2 X ... X H, where H is the group of quaternions, then the centre of 
G is n/4. The group of quaternions is a non-Abelian group of order 8 generated by 
two elements i,j such that = j'^ = l,i^ = = —l,ij = ~ji- 
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Test-Abelian: 

Initially, all elements of G, except the identity e, are unmarked. Initialize the 
set S' to 0 and the group H to {e}. 
while True do 

1 . Take the least unmarked element x and check if x commutes with all the 
elements in S. If it does not, declare G non- Abelian and exit. Otherwise, 
compute its successive powers x^,x^... till a marked element x^ is seen. 

2. Compute the product {x,x“^, ...,x^“^} x H and mark all these elements. 
(We are using x here to denote the multiplication between elements of G.) 
Add all these elements to the group H. Add the element x to the set S. 

3. If every element in G is marked, then declare G Abelian and exit, 
end while 

The subgroup generated by the elements of S is denoted by (S). Observe that 
the subgroup generated by x and the elements of S is {e,x,x'^, ...,x*“^} x (S'). 
Hence, H = (S) is correctly maintained. The invariant that the subgroup (S) is 
Abelian is maintained because if x commutes with g\,g 2 , ■■■gi (which commute 
among themselves), then powers of x commute with elements in {g\,g 2 , ■■■gi). 
So the subgroup generated by x, gi, g 2 , ■■■, gi is Abelian. Hence, (S) is always 
Abelian and once all the elements are marked, it means that G = (S). Hence, G 
is Abelian. 

Running Time of Test-Abelian: 

Claim. No element gets marked twice. 

Proof. Assume that x is an unmarked element and x^ is the smallest power of 
X that is marked. Then we need to show that the elements in {x,x^...,x^~^} x 
(S) are previously unmarked. Suppose some element in the above set, call it 
already marked, where gi, ...g^ €S,l<i<k — 1. Since only 
elements in (S) are marked, it means that x'-gl^ g^^ ...g^'~ £ (S), i.e., x* G (S'), 
contradicting that was the smallest power of x that is marked. □ 

It follows from this claim that an element in G gets marked exactly once. 
The size of the group H at least doubles in every iteration of the While loop, so 
the size of the set S is at most logn. Hence, we do at most log^ n commutativity 
checks over all the iterations of the While loop. Hence, it follows that the running 
time of the algorithm is 0(n). 



5 Open Problems 



In this paper we have shown that isomorphism of two Abelian groups of order 
n can be determined in time 0(n log p), where p is the smallest prime that does 
not divide n. We bound p by O(logn), so the worst case running time of this 
algorithm is 0(n log logn). An open problem is to design an 0(n) algorithm for 
Abelian group isomorphism. 
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Abstract. In a recent paper [5], we addressed the problem of finding a 
minimum-cost spanning tree T for a given undirected graph G — {V,E) 
with maximum node-degree at most a given parameter B > 1 . We devel- 
oped an algorithm based on Lagrangean relaxation that uses a repeated 
application of Kruskal’s MST algorithm interleaved with a combinatorial 
update of approximate Lagrangean node-multipliers maintained by the 
algorithm. 

In this paper, we show how to extend this algorithm to the case of 
Steiner trees where we use a primal-dual approximation algorithm due 
to Agrawal, Klein, and Ravi [1] in place of Kruskal’s minimum-cost span- 
ning tree algorithm. The algorithm computes a Steiner tree of maximum 
degree 0{B + logn) and total cost that is within a constant factor of 
that of a minimum-cost Steiner tree whose maximum degree is bounded 
by B. However, the running time is quasi-polynomial. 



1 Introduction 

We consider the minimum-degree Steiner tree problem (B-ST) where we are given 
an undirected graph G = (V,E), a, non-negative cost Ce for each edge e G E and 
a set of terminal nodes R C V. Additionally, the problem input also specifies 
positive integers {By}y^v The goal is to find a minimum-cost Steiner tree T 
covering R such each node v in T has degree at most By, i.e. degrp{v) < By 
for all V G V. We present an algorithm for the problem and give a proof of the 
following theorem. 

Theorem 1. There is a primal-dual approximation algorithm that, given a 
graph G = {V,E), a set of terminal nodes R C V, a nonnegative cost func- 
tion c : E ^ TZ^ , integers By > 1 for all v G V , and an arbitrary b > I 
computes a Steiner tree T that spans the nodes of R such that 

1. deg 7 ^(r;) < 126 • By -{- [41og{, n] -I- 1 for all v GV, and 

2. c{T) < 30PT 

where OPT is the minimum cost of any Steiner tree whose degree at node v is 
bounded by By for all v. Our method runs in 0(nlog(|i?|) • iterations 

each of which can be implemented in polynomial time. 



P.K. Pandya and J. Radhakrishnan (Eds.): FSTTCS 2003, LNCS 2914, pp. 289-301, 2003. 
© Springer- Verlag Berlin Heidelberg 2003 




290 



J. Konemann and R. Ravi 



The algorithm combines ideas from the primal-dual algorithm for Steiner 
trees due to Agrawal et al. [1] with local search elements from [2]. 

2 A Linear Programming Formulation 

The following natural integer programming formulation models the problem. Let 

U={U CV :UC^R^%,R\U 



min CeXe 




(IP) 


eeE 






S.t x{S{U)) > 1 


yu GU 


(1) 


x(<5(?;)) < By 


yvGV 


(2) 


X integer 







The dual of the linear programming relaxation (LP) of (IP) is given by 

max E yu - K ■ By (D) 

ueu vev 

s.t yu ^ Ce + ye = uv G E (3) 

U:eeS{U) 

J/, A > 0 

We also let (IP-ST) denote (IP) without constraints of type (2). This is the 
usual integer programming formulation for the Steiner tree problem. Let the LP 
relaxation be denoted by (LP-ST) and let its dual be (D-ST). 

3 An Algorithm for the Steiner Tree Problem 

Our algorithm is based upon previous work on the generalized Steiner tree prob- 
lem by Agrawal et al. [1]. A fairly complete description of their algorithm to 
compute an approximate Steiner tree can be found in the extended version of 
this paper. We refer to the algorithm by AKR. 

Before proceeding with the description of an algorithm for minimum-cost 
degree-bounded Steiner trees, we present an alternate view of Algorithm AKR 
which simplifies the following developments in this paper. 



3.1 An Alternate View of AKR 

Executing AKR on an undirected graph G = (V,E) with terminal set R C V 
and costs {ce}eeE is - in a certain sense - equivalent to executing AKR on the 
complete graph E[ with vertex set R where the cost of edge e = uv G R x R is 
equal to the minimum cost of any u, w-path in G. 
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Let Vs,T denote the set of s, t-Steiner paths for all s G i? fl S' and t € RDT. 
For an s, t-Steiner path P, we define 

c^{P) = c{P) -l- As -l- At -l- 2 • ^ ^ A„ 

j;eint(P) 



and finally, let 

distg^ (S, T) = min c^(P) 

P€Vs.t 

be the minimum c^-cost for any S, T-Steiner path. We now work with the fol- 
lowing dual: 

max E yu ^ ^ A^, * Py (1^2) 

ucR vev 

s.t ^ ^ yu ^ c'''(P) Vs, t G R, P G Ps,t G E (4) 

UUR,s^UMU 

V, A > 0 

Let iL be a complete graph on vertex set R. We let the length of edge (s, t) G 
E[P[] be distj,> (s, t). Running AKR on input graph P[ with length function dist^x 
yields a tree in P[ that corresponds to a Steiner tree spanning the nodes of R 
in G in a natural way. We also obtain a feasible dual solution for (D2). The 
following lemma shows that (D) and (D2) are equivalent (the proof is deferred 
to the extended version of this paper). 

Lemma 1. (D) and (D2) are equivalent. 

4 An Algorithm for the B-ST Problem 

In this section, we propose a modification of AKR in order to compute a feasi- 
ble degree-bounded Steiner tree of low total cost. We start by giving a rough 
overview over our algorithm. 

4.1 Algorithm: Overview 

We define the normalized degree ndegrp(y) of a node u in a Steiner tree T as 

ndegj,(u) = maxjO, degj.(v) - f3y ■ By} (5) 

where {/3«}i,gy are parameters to be defined later. 

Algorithm B-ST goes through a sequence of Steiner trees E^ , . . . ,E* and 
associated pairs of primal (infeasible) and dual feasible solutions x*, (y% A*) for 
0 < i < t. The goal is to reduce the maximum normalized degree of at least one 
node in V in the transition from one Steiner tree to the next. 

In the iteration our algorithm passes through two main steps: 
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Compute Steiner tree. Compute an approximate Steiner tree spanning the 
nodes of R for our graph G = (V, E) using a modified version of AKR. Roughly 
speaking, this algorithm implicitly assumes a cost function c that satisfies 

c{P) <c{P) <c^\P) (6) 



for all Steiner paths P. 

When the algorithm finishes, we obtain a primal solution cc* together with a 
corresponding dual solution yb In the following we use V‘ to denote the set 
of paths used by AKR to connect the terminals in iteration i. 

Notice that using the cost function c that satisfies (6) ensures that (y®. A*) is a 
feasible solution for (D2). The primal solution cc* may induce high normalized 
degree at some of the vertices of V and hence may not be feasible for (IP) . 
Update node multipliers. The main goal here is to update the node multi- 
pliers A* such that another run of AKR yields a tree in which the normalized 
degree of at least one node decreases. Specifically, we continue running our 
algorithm as long as the maximum normalized node-degree induced by x* is 
at least 2 logf, n where 6 > 1 is a positive constant to be specified later. 

Let Z\* be the maximum normalized degree of any node in the tree induced 
by x*. The algorithm then picks a threshold d* > Z\* — |"41ogf,n] -I- 2. Sub- 
sequently we raise the A values of all nodes that have normalized degree at 
least d* — 2 in the tree induced by x* by some e* > 0. We also implicitly 
increase the c cost of two sets of Steiner paths: 

1. those paths P that contain nodes of degree at least d* and 

2. those paths P that contain nodes of degree at least d* — 2. 

We denote to the set of all such paths by £b 

Rerunning AKR replaces at least one Steiner path whose c-cost increased with 
a Steiner path whose length stayed the same. In other words, a path that 
touches a node of normalized degree at least d* is replaced by some other 
path that has only nodes of normalized degree less than d* — 2. 

Throughout the algorithm we will maintain that the cost of the current tree 
induced by is within a constant factor of the dual objective function value 
induced by (y*,A*). By weak duality, this ensures that the cost of our tree is 
within a constant times the cost of any Steiner tree that satisfies the individual 
degree bounds. However, we are only able to argue that the number of iterations 
is quasi-polynomial. 

In the following, we will give a detailed description of the algorithm. In 
particular, we elaborate on the choice of e* and d* in the node-multiplier update 
and on the modification to AKR that we have alluded to in the previous intuitive 
description. 



4.2 Algorithm: A Detailed Top-Level Description 

We first present the pseudo-code of our B-ST-algorithm. In the description of 
the algorithm we use the abbreviation ndeg*(u) in place of ndeg^i(ti) for the 
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normalized degree of vertex v in the Steiner tree if*. We also let Z\* denote the 
maximum normalized degree of any vertex in if*, i.e. 

Z\* = maxndeg*(w). 

veR 

Furthermore, we adopt the notation of [2] and let 

'S'd = ^ • ndeg*(n) > d} 

be the set of all nodes whose normalized degrees are at least d in the solution. 
The following lemma is proved easily by contradiction 

Lemma 2. There is a dd € {Z\* — |"41ogj, n] -I- 2, . . . , Z\*} such that 

By <b- By 



for a given constant 6 > 1 . 

The low expansion of turns out to be crucial in the analysis of the 

performance guarantee of our algorithm. 

Finally, we let mod-AKR denote a call to the modified version of the Steiner 
tree algorithm AKR. Algorithm 1 has the pseudo code for our method. 



Algorithm 1 An algorithm to compute an approximate minimum-cost degree- 

bounded Steiner tree. 

1: Given: primal feasible solution x^,P^ to (LP-ST) and dual feasible solution to 



2 

3 

4 

5 

6 

7 

8 
9 

10 : 

11 : 



(D-ST) 

^ o,Vu e V 
i -s— 0 

while A* > 4[log^n] do 

Choose d* > A* - |'41og^n] -I- 2 s.t. By < b ■ Jfycs* 

d»-2 d* 

Choose e* > 0 and identify swap pair (P*, P*). 

AJ, -I- e* if u G *S'dt _2 and A^+^ XI otherwise 
^ mod-AKR(PbPy , (P*,F)) 

pi+i ^ pi \ {pi} u {p*} 

i i + 1 

end while 



Step 6 of Algorithm 1 hides the details of choosing an appropriate e*. We 
lengthen all Steiner paths in £*. Our choice of e* will ensure that there exists at 
least one point in time during the execution of a slightly modified version of AKR 
in step 8 at which we now have the choice to connect two moats using paths P* 
and P\ respectively. We show that there is a way to pick e* such that 
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We now break ties such that P* is chosen instead of P* and hence, we end up 
with a new Steiner tree 

In mod-AKR, we prohibit including alternate paths that contain nodes from 
S^i _2 and argue that the dual load that such a non-tree path P' sees does not 
go up by more than e*. Hence, we preserve dual feasibility. 

We first present the details of Algorithm mod-AKR and discuss how to find e* 
afterwards. 



4.3 Algorithm: mod-AKR 

Throughout this section and the description of mod-AKR we work with the mod- 
ified dual (D2) as discussed in Section 3.1. 

For a ri, r 2 -Steiner path P we let Rp C 2^ denote all sets S C R that contain 
exactly one of ri, r 2 G R. For a dual solution y, X we then define the cut-metric 
^y{P) = ^seRp Vs- From here it is clear that (y, A) is a feasible dual solution 
iff ly{P) < c^{P) for all Steiner paths P. We use ^(P) as an abbreviation for 

ly.{P). 

At all times during the execution of Algorithm 1 we want to maintain dual 
feasibility, i.e. we maintain 



l\P)<c^\P) (7) 

for all Steiner paths P and for all i. Moreover, we want to maintain that for all 
i, the cost of any path P G P* is bounded by the dual load that P sees. In other 
words, we want to enforce that 



c(p) < r(p) (8) 

for all P G P* and for all i. It is easy to see that both (7) and (8) hold for t = 0 
from the properties of AKR. 

First, let P* = P| U P| be a partition of the set of Steiner paths used to 
connect the terminal nodes in the iteration. Here, a path P G P* is added to 
Pj iff P n S'*, yf 0 and we let P* = P* \ P*. 

mod-AKR first constructs an auxiliary graph G* with vertex set R. We add an 
edge (s, t) to G* for each s, t-path P G P* \ {P*}. The edge (s, t) is then assigned 
a length of = ^*(P) -I- e* if P G P{ and = P(P) otherwise. 

Assume that P* is an s', t'-path. We then also add an edge connecting s' and 
t' to G* and let its length be the maximum of 1^{P^) and c{P^). Observe, that 
since P* \ {P*} U {P*} is tree, G* is a tree as well. 

Subsequently, mod-AKR runs AKR on the graph G* and returns the computed 
dual solution. We will show that this solution together with is feasible for 
(D2). A formal definition of mod-AKR is given in Algorithm 2. 

We defer the proof of invariants (7) and (8) to the end of the next section. 




Quasi-polynomial Time Approximation Algorithm 



295 



Algorithm 2 mod-AKR(P*, e*, y*, (P*,P*)): A modified version of AKR. 

1: Assume P' is an s', t'-Steiner path 
2: G' = (P,PQ where 

P* = {(s, t) ; 3s, t - path P G P' \ {P'}} U {(s', t')} 

3: For all s,t Steiner paths P G P* \ {P*} : 

i+r^(l\P) + e^ : PgP{ 

\r(P) : otherwise 

4: =max{c(F),r(p')} 

5: y'+i ^ AKR(G%1'+^) 

6: return 



4.4 Algorithm: Choosing e* 

In this section, we show how to choose e*. Remember that, intuitively, we want to 
increase the cost of currently used Steiner paths that touch nodes of normalized 
degree at least d*. The idea is to increase the cost of such paths by the smallest 
possible amount such that other non-tree paths whose length we did not increase 
can be used at their place. We make this idea more precise in the following. 

We first define /C* to be the set of connected components of 



G 



U p 



Let W be an auxiliary graph that has one node for each set in /CL Moreover, iA* 
contains edge (A', K”) iff there is a A', A"-Steiner path in the set P{. It can be 
seen that each path P G corresponds to unique edge in AL It then follows 
from the fact that G[A*] is a tree that A* must also be a tree. 

For A', A" G AC* such that (A', A") is not an edge of A*, let C be the unique 
cycle in A* + (A', A"). We then use P*(G) to denote the set of Steiner paths 
from P* corresponding to edges on C. 

For any two connected components A', A" G AC* we let 



d*(A',A")= min c(P). (9) 

P^'Pk',k" 



be the cost of the minimum-cost A', A"-Steiner path that avoids nodes from 
^d'- 2 - ^ components A', A" G AC* we denote the path that achieves 

the above minimum by Pk',k"- 

Definition 1. We say that a path P that contains no nodes from is 

e-swappable against P G P{ in iteration i if 

1. P € P*(G) where C is the unique cycle created in A* by adding the edge 
corresponding to P, and 

2. c(P) < r(P) -G e 





296 



J. Konemann and R. Ravi 



We are now looking for the smallest e* such that there exists a witness pair 
of paths (P*,P*) where P* is e*-swappable against Ph 

Formally consider all pairs K' , K" G lO such that {K' , K") is not an edge of 
iJ*. Inserting the edge corresponding to Pk',k" into iJ* creates a unique cycle 
C. For each such path P G P*(C), let e^, x"(P) be the smallest non-negative 
value of e such that 



We then let 



'K',K" 



d\K',K”) < l\P) + e. 

■■ minpgpi(( 7 ) j^rr{P) and define 



e = mm e^, 
K',K"eK' 



(10) 



We let (P*, P*) be the pair of Steiner paths that defines e\ i.e. P* is a, K' , K”- 
Steiner path such that 

1. inserting edge {K',K") into iJ* creates a cycle C and P* G P*(C'), and 

2. c(P*) < ?*(P*) -k e\ 

We are now in the position to show that (7) and (8) are maintained for our 
choice of (P®,P*) and eb The following Lemma whose proof is deferred to the 
full version of this paper shows that mod-AKR produces a feasible dual solution 
for (D2) provided that (yb A*) was dual feasible. 

Lemma 3. Algorithm 2 produces a feasible dual solution for (D2) 

given that (y*,A*) is dual feasible for (D2). 

This shows (7). It is clear from the choice of e* that we include a Steiner path 
P* into P*+^ only if /*+^(P*) > c(P*). (8) now follows since the dual load on any 
path is non-decreasing as we progress. 



4.5 Analysis: Performance Guarantee 

In this section we show that the cost of the tree computed by Algorithm 1 is 
within a constant factor of any Steiner tree satisfying all degree bounds. We en- 
sure this by way of weak duality. In particular, our goal is to prove the inequality 

PeP* SdR vev 

for all iterations i of our algorithm. 

First, we observe the following simple consequence of the AKR algorithm. 

Lemma 4. Assume that Algorithm 1 terminates after t iterations. For iteration 
Q <i <t, let l\^^^ = maxpgpi l'‘{P). We then must have 

Y nP) =2 Yys- ^max- 

PeP* ScR 
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Proof. Let r = |i?| and let P* = {P{, . . . be the paths computed by 

mod-AKR in iteration i—l. Also let y* be the corresponding dual solution returned 
by mod-AKR. W.l.o.g. we may assume that 

p{pi) < ... < r(p;_i). 

From the AKRalgorithm it is not hard to see that 

T.ys = l- j + 1) (12) 

ScR j=l 

•y 

= ^ • E ((^ - j + 1) - - j)) + \i\pu) 

where we define /*(Pq) = 0. The last equality (12) can be restated as 

E r(p) = 2Eys-^max 

PeT* S<ZR 

and that yields the correctness of the lemma. 

We now proceed with proving (11) for all 1 < t < t. Notice that Lemma 4 
together with (8) implies (11) for z = 0. We concentrate on the case z > 1. 

The proof is based on the following invariant that we maintain inductively 
for all 0 < z < t: 

3 • E BvK < E y^s- (in^) 

v^V ScR 

Since, A° = 0 for all v G F by definition, (Inv) holds for z = 0. 

Growing A* by e* at nodes v £ decreases the right hand side of (11) by 

3’ f P""- ^1111 11^® 11^® Steiner tree is potentially higher than 

the cost of the old tree if * . We must show that the first term on the right hand- 
side of (11), i.e. 3 ■ ^scRys grows sufficiently to compensate for the decrease 
in the second term and the increased Steiner tree cost. In order to show this we 
need the following technical lemma that lower-bounds the number of paths that 
contain nodes of degree at least cP in terms of the number of nodes of normalized 
degree at least d* — 2. 

Lemma 5. In each iteration \ < i <t we must have 

\V\\>a- E Bv 

for an arbitrary parameter a > 0 by setting /3y > 2ab + 1/By for all v £ V in 
the definition of ndegrpfu) in (5). 
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(1) 



( 2 ) 



Fig. 1. Figure (1) shows a Steiner tree where circles represent terminals and squares 
represent Steiner nodes. We assume that there are exactly two nodes of high normalized 
degree: s and t. Figure (2) shows the set M of marked edges in red. Notice that the 
edge between Steiner nodes s and s' is not marked since there must be a Steiner path 
connecting a terminal node I on the left side and a terminal node r on the right side. 
This Steiner path has the form {Pis, ss' , Ps'r) and Pis contains node s which has high 
normalized degree. 



Proof. We first define a set of marked edges 



M C [j 6{v) 



vGS', 



and then show that each Steiner path that contains nodes from has at most 
two marked edges. This shows that the cardinality of the set of marked edges is 
at most twice the number of paths in VI, i.e. 

\M\<2-\Vi\. (13) 

In the second part of the proof we argue that M is sufficiently large. 

First, we include all edges that are incident to terminal nodes from 5^^ into 
M. Secondly, we also mark edges uv € if* that are incident to non-terminal 
nodes in S{ji and that in addition satisfy that there is no Steiner path 



P={Pi,uv,P2)eV^ 

such that both Pi and P 2 contain nodes from . 

It is immediately clear from this definition that each Steiner path P G V" 
has at most two edges from M. 

We now claim that M contains at least 

ba ■ By (14) 

edges. To see this, we let T be the tree on node set that is induced by if*: 
For s,t G S'ji we insert the edge st into T iff the unique s, t-path in if* has no 
other nodes from 5^;. We let Pg C if* be the path that corresponds to an edge 
e e E[T]. 
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Define C E'‘ to be the set of tree edges that are incident to nodes of 
normalized degree at least i.e. 

U ^(^)- 

Now let {7 C if* be the set of unmarked tree edges that are incident to nodes of 
normalized degree at least d*, i.e. U = \ M. 

First observe that, by definition of M, for each unmarked edge e G U there 
must be an edge e* G E[T] such that e is an edge on the path Pet. Moreover, 
for all Ct G E[T] there are at most two unmarked edges on the path Pet. Since 
T has — 1 edges we obtain 

|C/|<2.(|5*,|-1). (15) 

Each node in 5^^ has at least f3yB^ + d* edges incident to it. On the other 
hand, since if* is a tree, at most (|5'^i| — 1) of the edges in if^^ are incident to 
exactly two nodes from 5^;. Hence, we obtain 



\E]it\> ^ /3„H„ + d* -(|5*,|-1)= 2ab- Bv +d*-|5^,| + l 



where the last equality uses the definition of /3„. 
Now observe that \M\ = |Eb| — |f7| and hence 




using (15) and (16). Notice that (P > — |"41og{,n] + 2 and Z\* > |"41ogj,n] and 

hence d* > 3. This together with (17) and the fact that is non-empty implies 

\M\ >2ab- By. (18) 

Combining (13) and (18) yields \Pl\ > ab ■ Ey Using the fact that 

d* 

By <b ■ By finishes the proof of the lemma. 

d '^—2 

The following claim now presents the essential insight that ultimately yields 
the validity of (11). 

Lemma 6. Let a be as in Lemma 5. We then must have 

^ ds + 2 -®** 

ScR ScR ves* . 

d . T - -2 

for all 0 < i < t. 
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Proof. We can use (12) to quantify the change in dual in iteration i. 

E -y's) = r + W^"(Pr-i) - 



ScR 



i=i 



>^-\n\ 



where the inequality follows from the fact that we increase the length of all paths 
in VI by e* and the length of all other paths are non-decreasing as we progress. 
An application of Lemma 5 finishes the proof. 



As mod-AKR finishes with cut metric we obtain 

r+i(P*+i) = E < 2 E 

PeP'+i ScR 



(19) 



from Lemma 4. Observe that the real cost of the Steiner tree is much 

smaller than r~''^(P*+^). In fact, notice that we have 



c(iP*+i) < l^+\P") + c{V^ \ {P*}) 
<P+^(P") + l\V^\{P^}) 



(20) 



where the last inequality follows from (8), i.e. the Lcost of a Steiner path in P* 
always dominates its c-cost. Also, observe that 



r+i(P* \ {P*}) = f (P* \ {P*}) + e* • \vl\ 

>f(P*\{P*})+ae*- E 



(21) 



vCS\ 



using Lemma 5. Combining (19), (20) and (21) yields 
c(P*+i) < r+^(P*+i)-ae*- E 

<2.Y,yf^-aP- E P^- 

ScR v^S\ 

-2 

We can now add (Inv) to the last inequality and get 

c(p*+i) < 3 E 2/E - 3 • E ■ E p- 



ScR 



vCV 



vCS\ 



Finally notice that = A(, + e* if u G Sd *-2 otherwise. Now 

choose a > 3 and it follows that 

c(p*+i) <3EyE- 3- 

ScR vCV 
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We have to show that (Inv) is maintained as well. Observe that the left hand 
side of (Inv) increases by 3e* • J2vpS' obtain from Lemma 6 that 

scR veS'. 

d '-2 

Choosing a > 6 shows that the right hand side of (Inv) increases sufficiently and 
(Inv) holds in iteration t -|- 1 as well. 

4.6 Analysis: Running Time 

For a Steiner tree V in path representation, we define its potential value as 

<1>('P) = ^ |^|max„gpndegp(j)) 

Per 

where ndeg.p(r!) is the normalized degree of node v in the Steiner tree defined 
by V. The proof of the following lemma is a direct adaptation of the arguments 
in [8] via the above potential function and is omitted. 

Lemma 7. Algorithm 1 terminates after 0(nlog(|i?|) • |i?| )) iterations. 
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Abstract. We consider the sabotage modal logic SML which was sug- 
gested by van Benthem. SML is the modal logic equipped with a ‘transi- 
tion-deleting’ modality and hence a modal logic over changing models. 
It was shown that the problem of nniform model checking for this logic 
is PSPACE-complete. In this paper we show that, on the other hand, 
the formula complexity and the program complexity are linear, resp., 
polynomial time. Further we show that SML lacks nice model-theoretic 
properties such as bisimulation invariance, the tree model property, and 
the finite model property. Finally we show that the satisfiability problem 
for SML is undecidable. Therefore SML seems to be more related to FO 
than to usual modal logic. 



1 Introduction 

In [1] van Benthem considered ‘sabotage modal logics’ which are modal logics 
over changing models. He introduced a cross-model modality referring to sub- 
models from which objects have been removed. SML is modal logic equipped 
with a ‘transition-deleting’ modality. This logic is capable of expressing changes 
of transition systems itself in contrast to the usual specifications for systems, 
where only properties of a static system are expressed. As an application one 
can consider computer or traffic networks where connections may break down. 
One can express problems related to this situation by first order specifications, 
but then one has to put up with the high complexity of FO. So SML seems 
to be a moderate strengthening of modal logic for this kind of problems. In 
Sec. 2 we repeat the formal definition of the sabotage modal logic SML which is 
interpreted over edge-labelled transition systems. 

Two main questions arise in this context: the model checking problem and 
the synthesis problem for SML. The model checking problem is the question, 
given a transition system and a system specification expressed in SML, does 
the system satisfy the specification? The synthesis problem asks, given a system 
specification, whether there is a transition system which satisfies the specifica- 
tion. In [5] we showed that the problem of uniform model checking for SML is 
PSPACE-complete. But in many cases one of the inputs for the model checking 
problem is fixed, either a single property is specified by a formula and one wants 
to check it for several systems; or there are different properties which should 
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be verified for a single system. For modal and temporal logics these two views 
of the complexity are usually referred to as program complexity and formula 
complexity of the model checking problem. In Sec. 3 we show that the formula 
complexity for SML is linear in the size of the formula and the program com- 
plexity for SML is polynomial in the size of the transition system. This result 
is in contrast to many other logics like LTL and CTL* where the formula com- 
plexity is as hard as the combined model checking complexity (cf. [6]). On the 
other hand, this result constitutes an interesting advantage over first order logic, 
since model checking for FO with a fixed transition system is PSPACE-complete 
(cf. [3]). Before we deal with the synthesis problem, we show in Sec. 4 that SML, 
in contrast to modal logic, lacks nice model-theoretic properties such as bisim- 
ulation invariance, the tree model property, and the finite model property. In 
Sec. 5 we split the synthesis problem into three questions: given a system spec- 
ification expressed as an SML-formula, the satisfiability problem asks whether 
there is a transition system at all which satisfies the specification, i.e. the sys- 
tem might be finite or infinite. The finite satisfiability problem asks for finite 
systems as models of the formula. And finally, the infinity axiom problem is the 
question whether a given formula has only infinite models. We will show that for 
SML all three problems are undecidable. We do that by reducing appropriate 
modifications of Post’s correspondence problem to these problems. 

We would like to thank Johan van Benthem for several ideas and comments 
on the topic. 



2 Sabotage Modal Logic 

In this section we repeat the formal definition of the sabotage modal logic SML 
with a ‘transition-deleting’ modality. We interpret the logic over edge-labelled 
transition systems. Let Prop = {p,p' ,p ” , . . . } be a set of unary predicate sym- 
bols. A (finite) transition system T is a tuple {S, S, R, L) with a finite set of 
states S, a finite alphabet A, a ternary transition relation R C S x S x S and 
a labelling function L : S ^ 2^''°?. Let p G Prop and a G S. Formulae of the 
sabotage modal logic SML over transition systems are inductively defined by the 
grammar 

p ::=T \ p\ ^if\py p\ <)aP I ^aV- 

As usual, T is an abbreviation for -iT. The dual modalities are defined by := 
“’0a“"P and \BaP '■= Let T = {S, S, R, L) be a transition system. To 

define the semantics of SML and for later use we define the transition system 
Te for a set A C i? as Te := (S, S,R\E,L). 

For a given state s G S' we define the semantics of SML inductively by 



(T,s) hT 
("T, s) h P 

(T, s) ^ 



for all T and all s G S, 

iff p G L{s), 

iff not (T, s) ^ p. 
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(T, s) ^l) iff (T, s) h or (T, s) h fp, 

(T, s) \= ()ay> iff there is s' G S' with (s, a, s') G R and (T, s') |= Lp, 

(T, s) \=^aV iff there is G R with {T{(t,a,t')}, s) h 

A measure for the complexity of an SML-formula ip is the number of nested 
sabotage operators. We call this the sabotage depth sd((^) of tp and define induc- 
tively 

sd(T) := sd(p) := 0, sd((/?i V ip 2 ) '■= max{sd((/?i), sd((/? 2 )}, 

sd(-'V') := sd(OaV') := sd(V'), sd(^aV') := sd{^|;) + 1. 

In the next section we will see that the sabotage depth of a formula is the main 
factor in the complexity of the model checking problem for SML. 

3 Model Checking for SML 

In this section we consider the model checking problem for SML. The general 
question in model checking is whether a given structure is a model of a given for- 
mula. The model checking problem for modal logic (ML) over transition systems 
is known to be solvable in polynomial time (cf. [2]). 

Proposition 1. The model checking problem for ML is RTlMEi-complete and 
can be solved in time 0{\ip\ ■ \T\), where \ip\ is the size of the given MT-formula 
ip and \T\ is the size of the given transition system T- □ 

The combined complexity of model checking for SML, i.e., the complexity mea- 
sured in terms of the size of the formula and in the size of the structure, was 
already settled in [5] . 

Theorem 2. Model checking for SML is PSPACE-compZete. □ 

In many cases one of the inputs for the model checking problem is fixed. If 
one wants to verify a single property for several systems the formula is fixed 
and if there are different properties that have to be verified for a single system 
the structure is fixed. For modal and temporal logics these two views of the 
complexity are usually referred to as program complexity and formula complexity 
of the model checking problem. 

In the following we show that the model checking problem for SML with one 
of the inputs fixed (either the transition system or the formula) can be solved in 
linear, resp., in polynomial time. For this purpose we reduce the model checking 
problem for SML to the model checking problem for ML and show that this 
reduction can be done in linear, resp., in polynomial time if either the transition 
system or the formula is fixed. 

Let T = {S, E, R, L) be a transition system. We define a new transition sys- 
tem = (S'®'^’’, A®®'’'’, R^^'° ^ that encodes all possible ways of sabotaging 

T: 



:=y; u {a I a G A}, := S x 2^ 
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\ (si,a,S2) eR\E}U 

{{{s,Ei),a, ( 5 ,^ 2 )) I 3si,S2 G 5 (^2 = U {(si, a, S 2 )})}, 
E) :=L{s) for each s G S and E C R. 



Over this system one can express the sabotage operator by traversing an 
a edge, i.e., by the modal operator Og. This motivates the following inductive 
definition of the ML-formula ip for a given SML-formula p: 



p = 



' ^ ^ 
(fi2 

< -1^ 

Oa^ 

. Oa^ 



if (p 
if p 

if p 
if p 
if p 



T ov p = p, 
Pi V P2, 

-nlP, 

Oa'0, 



If the sabotage depth of a formula p is small then we do not need the complete 
transition system to evaluate p. So, for n G N, we define Tn^'° to be the 

transition system restricted to the states {s,E) with \E\ < n. Note that 

Rsab jg isomorphic to R and = R‘^^'° for n > |i?|. 

Lemma 3. Let R = (S, E, R, L) be a transition system and p he an SML- 
formula. Then (T, s) p iff (s, 0)) |= p. 

Proof. We show by induction on the structure of p that for each E C R: 



{Te,s) \= p<^ {'T'id[^)+\E\As,E)) \= p. 



For E = % we obtain the desired property. The only interesting case for this 
induction is for p = The definitions of the semantics of the sabotage 

operator and the structure Re imply that (Te, s) |= iff there exists an edge 
(si,a, S 2 ) G R \ E such that (Re',s) \= ip for E' = E LI {(si,a, S 2 )}. By the 
induction hypothesis this holds iff (7^d^('^)+|E'p ('®’ E')) ^ Since sd(f/') + |if'| = 
(sd((/?) — 1) + {\E\ + 1) = sd((p) + \E\ and since there is an d-edge from (s,E) 
to (s,S') we get (7;"d’('i)+|E/|,(s,£i')) h V' iff ('^d'(^)+|Ep («> T^)) h OaV'- This 
implies the claim because p = □ 



This reduction can be used to determine the formula complexity and the program 
complexity of SML model checking. 

Theorem 4. The model checking problem for SML with a fixed transition sys- 
tem can he solved in linear time in the size of the formula (formula complexity) . 
The model checking problem for a fixed SML-/ormrtZa can he solved in polynomial 
time in the size of the transition system (program complexity). 

Proof. By Proposition 1 and Lemma 3 we can solve the model checking problem 
for p and T in time 0{\p\ ■ |7^d*(!^)|). From the definition of p we get |^| = \p\. 
For a fixed transition system R we can estimate the size of 7^d^(^) by |7^d*(v)l ^ 
C1(|T| • 2l^l). Hence the formula complexity is in 0{\p\). 
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Since the number of subsets E C R with \E\ < sd((/?) is in 
we obtain for a fixed SML-formula S So the program 

complexity is polynomial in |T|. □ 



4 Model-Theoretic Properties of SML 

For many logics, e.g. temporal logics like CTL, CTL*, and LTL, satisfiability can 
be shown to be decidable using the small model property. A logic has the small 
model property if a formula from this logic is satisfiable iff it has a model of 
size bounded by a computable function of the size of the formula. To decide the 
satisfiability problem, provided that the model checking problem is decidable, it 
is sufficient to check all structures up to this bounded sized. Modal logic even 
has the small tree model property, i.e., for each satisfiable ML-formula there 
exists a small tree that is a model of the formula. In this section we analyse 
model-theoretic properties of this kind for SML. 

In the sequel let T = {S, E, R, L) be a transition system and s £ S. For any 
subset A C A we fix the SML-formula 

aA ■■= OaT = y OaT 

aSA 

expressing that there is an a-successor of the current state for some a £ A. 
For a single letter we write Ca instead of For a word a = ai . . . we set 
(Ta := Ooj . . . OofeT. Since we deal with pointed transition systems (T, s) we use 
the notation ‘a-successor’ without a reference to a state as an abbreviation for 
‘a-successor of the origin s’. For later use we define the following SML-formulae. 
For a £ S and n G N let 



To, a — Tl,a — ^ T?^+2,a 

We write 7 a as an abbreviation for 71 , a. It is easy to see that we can fix the 
number of a-successors for a given state by the SML-formula 'yn,a- 

Lemma 5. (T,s) ^ 7„_a iff state s has exactly n different a-successors. □ 

In contrast to modal logic, SML lacks the tree model property, i.e., there are 
satisfiable SML-formulae which do not have a tree model. 

Lemma 6. The logic SML does not have the acyclic model property. In particu- 
lar it does not have the tree model property and it is not bisimulation-invariant. 

Proof. Consider the SML-formula (p := aaa A \=\a~'(Ja- Then every transition 
system E with (T, s) \= p has only one a-transition which starts and ends in 
state s. The last property holds since every bisimulation-invariant logic over 
transition systems has the tree model property. □ 

Another difference to modal logic is that each satisfiable ML-formula has a finite 
model, whereas for SML this property does not hold. 
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Theorem 7. The logic SML does not have the finite model property. 

Proof. Let (p be the following SML-formula: 



<)g^CFa A ^gUgGa A 


(Ml) 


a~'^a A 


(M2) 


^glh A 


(M3) 


A 


(M4) 


A ^g^(Tg A 


(M5) 




(M6) 



Then every transition system T with (T, s) |= has the following properties. By 
Ml: there is exactly one g-successor which has no a-successors and all other g- 
successors have an a-successor. In particular there is a g-successor. By M2: each 
^-successor has at most one a-successor. By M3: each g-successor has exactly one 
/i-successor. In particular there is an h-transition. By M4: each g-a-successor has 
exactly one /i-successor. By M5: the origin has no a- or /i-successors and there 
are no (/-^-successors. Finally M6 expresses that for every deleted h-transition 
there is a ^-successor in the corresponding submodel which has an /i-successor v 
and an a-successor w such that w has no /i-successors (maybe with v = w). 

It is easy to see that the transition system depicted in Fig. 1 is an infinite 
model of (p (if pointed at state s). For the last property notice that, if the nth 
/i-transition from the left is deleted, then the n -I- 1th ^-transition from the left 
satisfies M6. 




Fig. 1. An infinite model of p 



In the sequel the unique (/-successor without a-successors is called the sink state 
and is displayed as ►. Further we omit all g- and ^.-transitions. This means that 
in the figures all displayed vertices are (/-successors from the origin and have 
/i-transitions leading to a separate vertex. 

We have to show that p has only infinite models. For that we claim: 

Claim 1. For every model T with {T,s) \= p we have that every h-transition 
starts in a g-a-successor of s. 

Proof (of Claim 1). Assume that there is an ^--transition which starts in a state 
which is not a (/-a-successor of s. After deleting this /i-transition, M4 is still 
valid, i.e., there is no (/-a-successor of s without an /i-successor contradicting the 
second part of M6. 
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Claim 2. For every model T with (T, s) ^ the state s has infinitely many 
g-successors. 

Proof (of Claim 2). Assume that s has only k many g-successors for some k gN. 
Because of Ml and M2 state s has at most k — 1 many g-a-successors. Hence 
there exists a g-successor of s which is not a g-a-successor of s. Due to Property 
M3 state v has an /i-successor, but then - by Claim 1 - state v has to be a g-a- 
successors of s, contradiction. 

This completes the proof of the theorem. □ 

Note that not all given conjuncts are necessary to obtain an infinite model, but 
we need this construction below. Further the middle part of a model does not 
need to be a single chain which is only unbounded to the right. For example we 
could have further chains which are unbounded to both sides. Other models are 
‘inverse infinite trees’. 



5 Undecidability of Satisfiability for SML 

In this section we show that the satisfiability problem for SML is undecidable. To 
be more precise, we show that the problems of deciding whether a given formula 

1. has a model (Satisfiability), 2. has a finite model (Finite Satisfiability), and 3. 
is satisfiable, but has only infinite models (Infinity Axiom) are undecidable. To 
that aim we first define three variants of Post’s Correspondence Problem (cf. [4]) 
that will be reduced to the mentioned problems. 

We fix an alphabet U with lAl > 2. Given two lists a = (a\, . . . ,an) and 
l3 = (Pi,... , Pn) of non-empty words over S with n > 2 we formulate the 
following correspondence problems: 

1. (a,P) G PCP* iff there is a finite sequence (ii,... ,ik) in {2,... ,n} such 
that the finite words oiaq . . . ai^, and PiPi, . . . Pi,^ are the same, 

2. (a, P) G PCP(^ iff (a, P) ^ PCP, and there is a infinite sequence (ii,i 2 ,- . ■) 
in {2, . . . ,n} such that the w-words aiUi^ai.,^ . . . and PiPi^Pi^ • • • are the 
same, 

3. PCPoo := PCP, UPCP,^. 

We require that both decompositions start with ai, resp.. Pi and that the index 
1 does not occur again. The sequence (ii,... ,ik), resp., (ii,i 2 ,...) is called 
then a finite, resp., an infinite solution. Note that there are two different sorts 
of infinite solutions: regular solutions where the sequence is ultimately periodic 
and irregular solutions. The usual proof of undecidability of (modified) PCP 
can be easily adapted to show the undecidability of these three correspondence 
problems (cf. [4]). 

Given an instance (a,P) of the correspondence problems let F := U U {#}, 
/ := {1, . . . ,n} and A ■.= F\Jl\J{g,h} (w.l.o.g. An/ = 0 and An{g, h, ff\ = 0). 
To show the undecidability of the satisfiability problems we code solutions of the 
correspondence problems within a transition system over the alphabet A which 
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is SML-definable. The models are similar to the ones of Theorem 7, but in 
contrast to them they may be finite: we can interrupt the building process by an 
‘end marker’ The idea is to represent a solution on the middle chain of the 
models, reading it from right to left starting at the marker The words of the 
first list a start and end in vertices with an /-successor, the corresponding words 
of the second list f3 start and end in vertices, which are appropriate /-successors. 
Figure 2 shows as an example the representation of the finite solution (1,3,2) 
for the lists a = {ab,b,abaab) and (3 = {aba, abb, ba). Note that all words are 
read from right to left. If the solution is finite both decompositions end in the 
unique ►-vertex and the complete model can be made finite. If the solution is 
infinite the decomposition never ends and we have an infinite model. 




(32 



/3a 



/3l 



Fig. 2. Representation of a finite solution 



Before we give the SML-specifications to realise this idea we introduce some 
auxiliary notations. For non-empty words a over S and an SML-formula r/i 
we define inductively Sa{ip) by Sa{ip) := OaV’ and Saa'{tp) ■= 0a(“'CT/ A 6a'{tp)) 
{a £ S, a' G T’'*’). For example <5a6h(T) expresses that there is an a-b-b-successor , 
but neither an a-I- nor an a-6-/-successor. For a non-empty word (3 = b\ .. .bk 
over S and an SML-formula ip let := Obi • ■ ■ Ob^'O- 

To code solutions of the correspondence problems let (pg j be the following 
SML-formula (we explain the several subformulae below): 



Og-'OT A OgOgOT A (SI) 

ng0r“'CT_r A □g0/“'O'/ A (S2) 

□g7b A Og\3rjh A □ h<>g^crh A (S3) 

“■f7zi\{g} A □g-'CTg A (S4) 

□ bOg ((cr# A -iCTb) V {ah A Or-'O'b)) A (S5) 

^9#i A S #Dg-'cr# A -’CTgr# A (S6) 

0g(cTii A Ei-icri) A (S7) 



1^9 [ A 

iGl 



(<5ai(“'0'r) A 0i<5^i(“'O-r)) V 



V i^ai{o-jh) A <}^6'f^.{ah) A 0b(^ai(0g- ’CT/i) A <>iS'i3.{-'ah))) 



(S8) 



SI expresses that there is exactly one g-successor which has no F-successors 
and all other (/-successors have a F-successor, whereas S2 ensures that each g- 
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successor has at most one F-successor and at most one /-successor. S3 requires 
that each g-successor and each ^-F-successor has exactly one /i-successor and 
every /.-transition starts at a g-successor. S4 expresses that the origin has only 
g-successors and that there are no g-g-successors. If S5 is true then for every 
deleted /-transition there is a g-successor in the corresponding submodel which 
either has a ^-successor, but no /-successors; or has an /-successor and a, F- 
successor v such that v has no /-successors. The next formula S 6 ensures that 
each model has a g-#-successor w, but only one ^-transition at all and that 
there is no 5 -T-^-successor. Further S 6 and S7 require that there is exactly one 
1 -transition which starts and ends in w. 

Notice that we do not need to have a finite if-chain starting from the (unique) 
g-#-successor. Finally, S 8 expresses (together with the previous formulae) that 
for every ^-successor v, whenever v has an f-successor for some i £ I then either 
one reaches the unique sink state of the model by both words a* and i ■ Pi (see 
Fig. 3) or there is some j £ I, j ^ such that one reaches the same g-successor 
by both words a* • j and i ■ Pi (see Fig. 4). Further, while reading the word a*, 
there are no /-successors except for the first and the last vertex. 



i 




^ ^ — • ■< — • — V -< — 

'‘j 'll '*1 '*1 'll 

V. / 

Oti 

K, ^ 

0i 

Fig. 3. First case for S8 



j i 




/ij h,, fe,, /ij 



0i 

Fig. 4. Second case for S8 



In the sequel we show some properties which are valid for every transition system 
T with state s £ S such that (T, s) \= ^a,0- 

Property 1. (1) Every g-successor with an I -successor has a E -successor as well. 
(2) Each g-E -successor is also a g-successor. (3) There is a unique g-^-successor 
V. (4) The unique sink state has no I -successors. In particular the g-4f -successor 
is distinct from the sink state. 

Proof. (1) If g-successor v has an /successor then, due to S 8 , <5aj(T) is true. 
In particular O^^iT holds, if a* = . . . (oi is non-empty). Hence v has an 

a(-successor. (2) By S3 every g-F-successor has an /-successor, but every /- 
transition starts at a g-successor. (3) S 6 ensures the existence and, since there 
is only one ^-transition at all, the uniqueness of v. (4) Immediately by (1) and 
since the g-#-successor has a 1 -successor. □ 

Now we inductively define a (finite or infinite) sequence Vk of vertices which can 
be found in every model of Tap- Let vq be the unique g-successor that has a #- 
successor and let vi be this #-successor (well-defined by Prop. 1.3). By Prop. 1.2, 
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vi is also a ^-successor. Further we have vq ^ v\ (otherwise cr^r# holds which 
violates S6). Assume that Vk for fc > 1 is already defined such that Vk is the 
unique F-successor of Vk-i- In particular Vk is a ^-successor by Prop. 1.2. Case 
1 : Vk has no A-successor. Due to Vk has no #-successor as well. Since 

the sink state is the unique ^-successor without a F-successor we have that Vk 
is the sink state. In this case let k := k + 1. Case 2: Vk has a F-successor. Let 
Vk+i be this F-successor which is unique by S2. 

If Vk is defined for all k £ ut then set k := uj. Note that the Vk’s do not 
need to be pairwise distinct, since we may have a loop in the F-chain. Now we 
extract those Vk’s which have Fsuccessors. For that we inductively define the 
sequence jk- Let jo := 1- Assume that jk is already defined such that vj^ has an 
Fsuccessor. If there is m with jk < m < k such that Vm has an Fsuccessor then 
let jk+i be the minimal m with this property. Otherwise let A := A: + 1. 

Again, if jk is defined for all k £ uj then set A := uj. For all fc < A we have: the 
Fsuccessor of Vj^ is unique by S2. Hence we set ik = i H vj^ has an Fsuccessor. 

Property 2. k < uj iff X < uj. 

Proof. (=i>) This is clear by the definition of the jk’s. (<i=) If A < w let to = A — 1 
(note that A > 1 by definition). Then vj^ is the last vertex in the sequence 
vi,V 2 ,. . . which has an Fsuccessor. But for vj^ the last disjuncts of S8 cannot 
be satisfied, since otherwise - after \ai^ \ steps - we would reach I which 

must have an j-successor for some j £ I,j 1. So only the first disjunct is 
satisfied and state, i.e., k = jm + \ + 1. □ 

If A < w we set j\ ■= k — 1, i.e., in this case Vj^ is the sink state. 

Property 3. (1) jo = 1 and jk G {2, . . . ,n} for all k > 1. For all k < X the 
following holds: (2) jk < jk+i, (3) Vj^ has ai^-successor and (j) for all 

jk < m < jk+i, Vm has no I-successors. (5) The ik-successor of vj^ is equal to 
Vm for some 1 < m < k. 

Proof. (1) By the definition of the jk’s and by S8. (2) is clear by the definition 
of the jk’s. (3) and (4) immediately follow from S8 and the definition of Sa. We 
show (5) by induction. We have jo = 1 and io = 1. S6 and S7 ensure that the 
1-successor of Vi is Vi itself. Assume now that the property is already given for k 
and that fc-|- 1 < A. S8 guarantees that we reach from vertex Vj^. the same vertex 
by the words -ik+i and ik- ffk ■ Since by (3), the a -successor of Vj^, is exactly 
this means that the F+i-successor of is equal to the ik ■ -successor 
of Vj^ . By induction the ifc-successor of is equal to Vm for some 1 < to < k, 
hence the ifc+i-successor of is equal to Vm' with m' = m + \Pi^ | < k. □ 

The last property ensures that we stay on the F-chain starting at v\ if we follow 
the Ftransitions and that we do not change to other parts, resp., branches within 
the model. Now we can define the sequence Ik for A: < A by = to, if Vm is the 
(unique) ^-successor of Vj^. Again, if A < w we set l\ := k — 1. Then we have: 

Property j. (1) Iq = 1. For all k < X the following holds: (2) Ik < h+i, and (3) 
vi^. has (3i^ -successor . 
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Proof. The 1-successor of vi is vi , hence (1) holds. By Prop. 3.3 and the definition 
of Ik the Zfe-successor of is vi^ and the ■ z^+i-successor of Vj^ is . On 
the other hand, by S8, the ik ■ Pi^-successor of Vj^, is as well. This shows 
(3) and, since is non-empty, also (2). □ 

Finally, we state the main property (we denote the prefix of a word a of length 
k by a[fc]): 

Property 5. (1) If k < to then aiOij . . . = PiPi^ . . . Pi^_^ ■ (2) If k = to then 

{aiaij^ai^ ■ ■ ■ )[m] = {PiPi^Pi^ . . . )[m] for all m € to. 

Proof. (1) By Prop. 2 we also have A < w, in particular j\ and l\ are defined. 
Since Vj^ = vi and is the sink state, by Prop. 3.3 we have that aiai^ . . . ai^_^ 
is exactly the Af-word between the vertices vi and the sink state. On the other 
hand, since vig = vi and vi^ is the sink state, by Prop. 4.3 we have that 
PiPi^ also is the A7-word between the vertices Vi and the sink state. 

In particular the two words are identical. 

(2) For m = 0 this is clear. For to > 0 let /irn ■= max{fc | jk < to} and Vm ■= 
max{fc I Ik < to}. Again by Prop. 3.3, the word between the vertices vi and Vm+i 
is exactly the A-word ai . . . [to -I- 1 — = (cxiatj^ai.^ . . . )[m]. 

On the other hand, again by Prop. 4.3, this word also is /3i . . . • {Pi^^ [m + 

^ ■ ■ ■)Vn]. □ 

Now we are ready to show the main theorem: 

Theorem 8. The following holds: 

1. {a,P) € POP, iff Ta,0 O' finite model, 

2. {a,P) G PCP(j iff Ta,0 i^os an infinite, but no finite model, 

3. {a,P) € PCPoo iff i^ satisfiable. 



If If S I a a a a 

• • • • I ^ • • • 

Fig. 5. An infinite model 



Proof. (1) (=J>) Let (ii, . . . ,ik) be a finite solution. Then the appropriate finite 
chain analogous to the one depicted in Fig. 2 together with the corresponding 
g- and ^.-transition as depicted in Fig. 1 is a finite model of Ta,f} (if pointed at 
vertex s). (<^) If ^ has a finite model then for that model we have k < to. By 
Prop. 3.1 and Prop. 5.1 this means that (A,... ,za-i) is a finite solution and 
therefore (a,P) G PCP,. 

(2) (=J>) Let (ii,Z 2 ,...) be an infinite solution and let a G A7 be arbitrary. 
Then the transition system depicted in Fig. 5 is a model of Ta,/3 (together with 
the corresponding g- and /i-transitions) . The upper infinite chain is labelled with 
a’s and the vertices have no /-successors. The lower infinite chain is labelled with 
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the word aiai^ai^ ■ ■ ■ (from right to left starting from the (/-^-successor) and 
has appropriate /-transitions as described above. 

By definition of PCP;^ we have (d,/3) ^ PCP, and therefore by (1), 
has no finite model. (<i=) If ^ has a infinite model then for that model we 
have K = uj. Since there are no finite models we have {a, (3) ^ PCP, by (1). By 
Prop. 3.1 and Prop. 5.2 this means that {ii,i 2 , . . . ) is an infinite solution and 
therefore {a, (3) G PCP^^. 

(3) This is immediate by (1) and (2). □ 

Applying the undecidability of the correspondence problems we immediately 
obtain the desired result: 

Theorem 9. The problems Satisfiability, Finite Satisfiability, and Infinity Ax- 
iom for SML are undecidahle. □ 

6 Conclusion and Outlook 

We have considered the sabotage modal logic SML, an extension of modal logic 
that is capable of describing elementary changes of structures. Modal logic itself 
is one of the simplest logics for specifying properties of transition systems. We 
have shown that operators that capture basic changes of the structure, namely 
the removal of edges, already strengthen modal logic in such a way that all the 
nice algorithmic and model-theoretic properties of modal logic get lost. In fact, 
from the viewpoint of complexity and model theory SML much more resembles 
first-order logic than modal logic, except for the linear time formula complexity 
of model checking. 

There are some open questions related to SML. For example one may restrict 
the global power of the sabotage operator (e.g., the deleted transition has to 
start at the current state). Model checking of SML for unary alphabets is still 
PSPACE-complete (cf. [5]), but it is open whether satisfiability is then decidable. 
Also, it would be nice to have a characteristic notion of bisimulation for SML 
and to know more about valid principles, e.g. interaction laws between ordinary 
and sabotage modalities. 
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Abstract. We consider two problems related to the well-studied sorting 
by transpositions problem: (1) Given a permutation, sort it by moving a 
minimum number of strips, where a strip is a maximal substring of the 
permutation which is also a substring of the identity permutation, and 
(2) Given a set of increasing sequences of distinct elements, merge them 
into one increasing sequence with a minimum number of strip moves. 

We show that the merging by strip moves problem has a polynomial 
time algorithm. Using this, we give a 2-approximation algorithm for the 
sorting by strip moves problem. We also observe that the sorting by 
strip moves problem, as well as the sorting by transpositions problem, 
are fixed-parameter-tractable. 

1 Introduction 

Let 7T be a permutation on n elements, written as a string 7ri7T2 . . . 7 t„. A strip is 
a maximal substring of tt which is also a substring of the identity permutation 
id„. (For example, in the permutation 7, 6, 5, 2, 3, 4, 8, 1 on 8 elements, there are 
six strips, and 2 3 4 is the only strip containing more than a single element.) 
A strip move is the operation of picking a strip and placing it elsewhere in the 
string. We define the Sorting By Strip Moves problem, SBSM, as the problem 
of finding the smallest number k such that tt can be transformed into id„ with k 
strip moves. This smallest number of required strip moves is denoted s(7t), and 
is called the strip move distance of tt. 

The main motivation for the SBSM problem comes from a purely combi- 
natorial point of view. There is a significant body of literature studying the 
complexity of sorting under various restrictions. This study can be considered 
a contribution in this genre. Furthermore, SBSM is related to another problem 
of immense practical significance, that of Sorting by Transpositions. A trans- 
position is the operation of picking any substring of tt and placing it elsewhere 
in the string. Let f(7r) denote the smallest number of transpositions which can 
transform tt to the identity id„. Computing t(7r) is the Sorting By Transpo- 
sitions problem, SBT. It (and its variants) is a well-studied problem - see for 
instance [2,5,4,6,9,8,11,12,13] - because of its connections to genome rearrange- 
ment problems [1,5,14,15]. The exact complexity of SBT is still open. The best 
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known results today are polynomial time approximation algorithms achieving 
an approximation ratio of 1.5 [2,5,12]. 

Strips are important combinatorial objects in understanding t(7r), because an 
optimal transposition sequence need never break a strip [5] . Since a strip move 
is also a transposition, it is clear that t(7r) < s(7t). Let #strip(7r) denote the 
number of strips in tt. It is easy to see that a strip move, and even a transposi- 
tion, can reduce the number of strips at most by 3. And #strip(id„) = 1. Thus 
reducing the number of strips to 1 needs at least (#strip(7r) — 1) /3 transpo- 
sitions. On the other hand, the naive algorithm of repeatedly picking the strip 
containing 1 and moving it to the immediate left of the strip logically succeeding 
it reduces the number of strips by at least 1 in each move; hence s(7t) is no more 
than #strip(7r) — 1. Thus 



#strip(7r) — 1 
3 



< t(7r) < s(7t) < #strip(7r) — 1 



( 1 ) 



It follows that the naive algorithm is a 3-approximation for SBSM (and for 
SBT). Our main result in this paper is a 2-approximation algorithm for SBSM, 
Theorem 3 in Section 4. While the complexity of SBSM itself remains unsettled, 
the 2-approximation exposes a lot of interesting combinatorics driving the way 
permutations are transformed. It is to be hoped that some of this combinatorics 
will be useful in understanding the SBT problem as well. 

The rest of the paper is organized as follows. After introducing notation in 
the following section, we show in Section 3 that optimal strip move sequences 
have certain canonical properties. We also show that both SBSM and SBT are 
fixed-parameter tractable. In Section 4 we introduce a new but similar problem 
MBSM, that of optimally merging increasing sequences via strip moves across 
sequences. Sections 5 and 6 then show that MBSM is in P, and that MBSM 
provides a 2-approximation for SBSM, respectively. Section 6 uses the canonical 
properties established in Section 3. 



2 Preliminaries 

We denote by [n] the set {1,2,..., n}, and by S'„ the set of all permutations over 
[n] . We denote by id„ the identity permutation on n elements and by rev„ the 
reversal of id„. 

For a permutation tt and an element a G [n], we denote by strip(a,7r) the 
strip in tt containing a, by last(a,7r) the largest element of strip(a,7r), by 
next(a,7r) the element immediately following a in tt if such an element exists 
(undefined if 7r„ = a), and by prev(a, tt) the element immediately preceding a in 
7T if such an element exists (undefined if tti = a). For a (sub)strip x, we denote 
by first(cc,7r) and last(a;,7r) the smallest and largest elements respectively 
of X. By pred(a;,7r) we mean strip(f irst(a;, tt) — 1, tt) (when first(a;,7r) > 
1), and succ(x,7r) denotes strip(last(a;, tt) -|- 1,7t) (when last(a;,7r) < n). By 
“move a to predecessor” we mean “move strip(a, tt) to the immediate right of 
pred(strip(a, tt), tt)”. Similarly, “move a to successor” means “move strip(a, tt) 
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to the immediate left of succ(strip(a, tt), tt)” . By “move 1 to predecessor” we 
mean “move strip(l,7r) to the extreme left end of the permutation” and by 
“move n to successor” we mean “move strip(n, tt) to the extreme right end of 
the permutation”. We denote by Inc(7r) the set {a G [n — 1] | next(a,7r) > a}. 
Clearly, Inc(id„) = [n — 1]. 

Given a permutation tt, replace each strip x by the single element f irst(x, tt) 
to obtain a sequence S. For a £ S, let rank(a) denote |{& G S' | & < a}|. Now in 
S replace each a by rank(a) to get a permutation cr on #strip(7r) elements. We 
denote by ker(7r) the permutation a so obtained. 



Example 1. For the permutation tt = 7, 6, 5, 2, 3, 4, 8, 1, strip(7, tt) = while 
strip(3,7r) = 2 3 4. Also, last(3,7r) = 4; next(3,7r) = 4 and next(4,7r) 
= 8; prev(3,7r) = 2, prev(2,7r) = 5 and prev(7, tt) is undefined. Further, 
f irst(p^, 7t) = 7, last( 2 3 4 , tt) = 4, pred(|7|, tt) = [^, succ(|jJ, tt) = 2 3 4 



“move 1 to predecessor” gives 1, 7, 6, 5, 2, 3, 4, 8, “move 1 to successor” gives 
7, 6, 5, 1, 2, 3,4, 8, and “move 8 to successor” gives 7, 6, 5, 2, 3, 4, 1, 8. The set 
Inc(7r) is {2,3,4}, and ker(7r) is the permutation 5,4, 3, 2, 6,1. 



The Sorting By Strip Moves problem, SBSM, can be framed as a decision 
problem or an optimization problem or a search problem. In the decision version, 
an instance is a permutation tt G and a number k, and the question is to 
decide if s(7t) < k. In the other versions, an instance is just a permutation 
7T G S„, and the desired output is the value of s(7t) in the optimization version, 
and a sequence of s(7t) moves sorting tt in the search version. 

If S and T are two sequences, we denote by S' || T the sequence obtained by 
composing S and T. We denote the empty sequence by e. 



3 Some Properties of Optimal Strip Move Sequences 

In [5], Christie shows that the following intuitive idea can be formally proved 
true: an optimal SBT sequence need never break apart an existing strip, since the 
strip would have to be put together again. His formal proof involved establishing 
that t{Tr) = t(ker(7r)). His proof cannot be adapted directly for SBSM; it needs 
the introduction of some additional notions. 

Definition 1. 1. A strip is a maximal substring of tt which is also a substring 

of the identity permutation id„. A substrip is a substring of a strip. 

2. A (sub)strip move in tt is the operation of picking a (sub) strip of tt and 
placing it elsewhere in the string. 

3. A (sub) strip move in tt is said to be a canonical (sub)strip move if it places 
a (sub) strip x to the immediate right of pred(x,Tr) or to the immediate left 
of svlcc(x,tt). (In particular, if the substrip x is neither a prefix nor a suffix 
of any strip o/tt, then x cannot be moved in a canonical substrip move.) 

4-. By s'{tt), s(7t) and sc(tt) we denote the lengths of the shortest sequence of 
substrip moves, strip moves and canonical strip moves, respectively, sorting 
TT. We call s'{tt), s(tt) and sc(tt) the substrip move distance, the strip move 
distance and the canonical strip move distance respectively of tt. 
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Clearly, t(7r) < s'(7t) < s(7t) < sc(7t). We show that in fact s'(7t) = sc(7t). 

Lemma 1. s(7t) = s'(7t). Furthermore, every optimal substrip move sequence is 
actually a strip move sequence. 

Proof. We know that s'(7t) < s(7t), so we need to prove that s(7t) < s'(7t) 
and that optimal substrip move sequences move only strips. We first prove the 
following claim. 

Claim. No canonical substrip move increases the substrip move distance. That 
is, if applying a canonical substrip move to a results in tt, then s'(7t) < s'(cr). 

Proof of Claim: This proof follows from an adaptation of Christie’s proof that 
t(ker(7r)) < t(7r) [5]. Let the canonical move applied to u move substrip x to 
its predecessor (the move to successor case is symmetric). Consider a substrip 
sorting sequence p\, . . . , ps, with (t° = a, obtained by applying pi to 
cr*, and = id„. We mark the positions occupied by a; in cr and tt with a 
special symbol *. Now perform the moves pi starting with tt® = tt, with the 
proviso that the *s are considered glued to a = f irst(a;, cr) — 1 and only move 
with it. Thus if substrip move pi moves a, move all the *s with it; if pi does not 
move a, do not move any *. All non-* elements are moved exactly as in pi. It 
is easy to see that the modified moves continue to be strip or substrip moves. 
The p sequence correctly sorts cr; in particular, it places all the non-* elements 
in correct relative order. Hence the corresponding sequence on tt also places the 
non-* elements in correct relative order. The * elements are in correct relative 
order because they were glued to a. So the modified sequence correctly sorts tt, 
showing that s'(7t) < s'(cr). □ 

Let us see why this claim suffices. We induct on s'(7t). The base case is trivial, and 
we now assume that the statement holds for all permutations a with s' {a) < s. 
Let s'(7t) = s -I- 1, and consider an optimal substrip sorting sequence for tt. Let 
the first move, p, move a substrip x oi n and result in permutation a, where 
s'(cr) = s. If cr is a proper substrip, then let uxv be the strip of tt containing 
X, where at least one of u, v, is not e. Now reinserting x between u and v is 
a canonical substrip move applied to a giving tt. So by the mentioned claim, 
s'(7t) < s'(cr) = s, contradicting our assumption that s'{tt) = s -|- 1. 

Hence x must be a strip; i.e. the first move in any optimal substrip move 
sorting sequence for tt must be a strip move, resulting in a permutation a. By 
the induction hypothesis, s(ct) = s and every optimal substrip move sorting 
sequence for a moves only strips. 

Thus in every optimal substrip move sorting sequence for tt, the first move is 
a strip move by our claim, and the remaining moves are strip moves by induction; 
thus the entire sequence moves only strips. □ 

The above proof is constructive; given a substrip move sorting sequence of length 
/, we can efficiently find a strip move sorting sequence of length at most 1. 

Corollary 1. No canonical substrip move increases the strip move distance. 
That is, if applying a canonical substrip move to a results in tt, then s(7t) < s(cr). 
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We now establish that making canonical strip moves is good enough. 

Lemma 2. sc(7t) = s(7t) = s'(7t). 

Proof. It suffices to show that sc(7t) < s(7t); we prove this by induction on 
s = s{tt). The base case is vacuously true. Now let tt be a permutation with 
s(7t) = s + 1, and let tt' be the permutation resulting from applying the first 
move p of an optimal strip move sorting sequence to tt. Clearly, s(7t') = s. So 
by the induction hypothesis, tt' can be sorted by a canonical strip move sorting 
sequence, say pi, p 2 , ■ • ■ , Ps- 

If p is a canonical strip move, there is nothing to prove. Otherwise, let p move 
a strip X to some non-canonical place. Then x must be a strip in tt'. Let a be 
the permutation obtained from tt or from tt' by moving x to its predecessor (if it 
exists, otherwise successor). Note that a can be obtained from tt' or tt in a single 
canonical strip move. By Corollary 1, s(ct) < s(tt') = s. So by the induction 
hypothesis, sc{a) = s(cr) = s. By construction, sc(7t) < 1 + sc(ct) = 1 + s = s(7t), 
completing the induction. □ 

Unlike in Lemma 1, it is not true that the moves of every optimal strip move sort- 
ing sequence must be canonical. For example, sorting 1, 4, 3, 2, 6, 5 as 1, 3, 2, 4, 6, 5 
to 1,2,3,4, 6, 5 to 1,2, 3, 4, 5,6 is optimal, but the first move is not canonical. 
However, the mere existence of optimal canonical strip move sequences gives the 
following results. 

Corollary 2. s(7t) = s(ker(7rj). 

From Lemma 2, there exists a canonical strip move that decreases the strip move 
distance. There are at most 0(n) choices of a canonical move. Hence, 
Corollary 3. The decision, optimize and search versions of SBSM are polyno- 
mially equivalent: if any one version is in P, then all versions are in P. 

The fixed-parameter-tractability FPT paradigm [7] has proved to be im- 
mensely useful and practical in the context of problems arising from molecular 
biology (see for instance [3,10]). It is thus natural to look at the complexity of 
SBSM in this framework. Let I = #strip(7r). From Equation 1, if ? > 3fc -I- 1 
then k < t{Tr) < s{tt). There are 0{l) choices for the first (canonical) strip move 
(by Lemma 2) and 0{l^) for the first transposition which doesn’t break strips 
(by Christie’s result [5]). This gives the following corollary. 

Corollary 4. SBSM and SBT are fixed-parameter-tractable FPT. 

That is, on input tt € Sn,k < n, with parameter k, there is an algorithm to 
decide if s(7t) < k or t{Tr) < k, with run time f{k)n'' for some function f and 
some constant c independent ofk. 

The bounds of Equation 1 are all tight, and there are permutations for which 
the inequalities are strict. For instance, in the Lemma below, (a) follows from 
Lemma 2 and Corollary 2, while (b) follows from the fact that for n > 3, 
t(rev„) = [(n -I- l)/2] [5,9]. More examples illustrating the other bounds can 
similarly be constructed. 

Lemma 3. (a) For n > 1, s(rev„) = n — 1 = #strip(rev„) — 1. 

(b) For n>5, [#strip(rev„) — l]/3 < t(rev„) < s(rev„). 
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4 Merging By Strip Moves 



We introduce a new problem which will be used to obtain the 2-approximation 
for SBSM. The input to this problem is a multiset S = {81,82, ■ ■ ■ , 8k} of dis- 
joint increasing sequences whose union is [n]. That is, each 8i is an increas- 
ing sequence, and 81 || 82 || • ■ • |1 8k G 5'„. (Some of the sequences could be 
empty; barring those, S is just a set.) The goal is to transform S to the multiset 
M„ = {id„, e, . . . , e}. The moves allowed are of the form: Pick a strip from any 
sequence S'j, and insert it into some other sequence 8j in the unique place such 
that the new sequence 8 } is again an increasing sequence. Here, a strip is defined 
as a maximal substring of some 8i which is also a substring of id„. 

The Merging By Strip Moves problem, MBSM, is the problem of finding 
the smallest number k such that S can be transformed into M„ with k moves. 
This smallest number of required moves is denoted m(S). 

Analogous to the SBSM setting, we can define a canonical strip move in a 
merging- by-strip-move sequence; such a move inserts a strip into the sequence 
containing its predecessor or its successor strip. Let mc(S) denote the smallest 
number of canonical strip moves required to merge S into M„. 

It is easy to see that any strip move in a strip-move-merging sequence can 
reduce the number of strips at most by 2. But #strip(S) — 1 canonical strip 
moves suffice: repeatedly move the strip containing 1 to its successor. Thus, 
similar to Equation 1, we have 



#strip(S) — 1 
2 



< to(S) < TOc(S) < #strip(S) — 1 



(2) 



We will show in Lemma 4 that in fact mc(S) = m(S); that is, canonical strip 
moves suffice. It can be shown that the remaining inequalities above can be strict 
and that the bounds are tight. 

For a G [n] and MBSM instance S, we denote by Inc(S) the set {a G [n — 1] | 
a is not the last element of any sequence of S }. We have Inc(M„) = [n — 1]. 

The following theorems are the main results of this paper; the next two 
sections are devoted to proving Theorems 1 and 2 respectively. 

Theorem 1. MBSM is in P. 



Theorem 2. Given an instance tt of SBSM, in 0{n) time we can construct an 
instance S of MBSM such that s{t:) < m(S) < 2s(7t). 

From these two theorems, we conclude: 

Theorem 3. SBSM has a polynomial time 2- approximation algorithm. 

Since all the proofs in this paper are (efficiently) constructive, we also have a 
constructive version of Theorem 3: in polynomial time, we can find a strip move 
sorting sequence no longer than twice the optimal. 
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5 MBSM Is in P 

We present a dynamic programming algorithm to exactly compute m(S). The 
strategy is to construct a related graph, and identify a substructure in that 
graph corresponding to an optimal merging sequence. The substructure is defined 
below, related to to(S) in Lemma 4, and shown to be efficiently computable in 
Lemma 5. Theorem 1 follows from these lemmas. 

Given an instance S of MBSM, we build an associated directed graph G 
as follows: G has n vertices numbered 1 through n. The multiset S having k 
sequences partitions the vertices into k parts, and the edges of G form a total 
order on each part. Formally, 

Definition 2. Let = {Si, S 2 , ■ ■ ■ , Sk} ■ Then 

1. G = (y, E) where V = [n\, and E = {{u,v) \ u < v, 3p G [k] : u,v G S'p}. 

2. Edges in E of the form (i, t + 1) are called unit edges. 

3. For 1 < i < j < n, G{[i,j]) denotes the subgraph of G induced by the vertex 
set [i,i + 1, . . . ,j}, and its edge set is denoted E{[i,j\). 

4- Edges (i,j) and {k,l) are said to cross ifi<k<j<l ork<i<l<j.A 
set E' C E is said to be a non-crossing set if no two edges in E' cross. 
(Note that {i,j) and (z, k) cross, but not (i,j) and {k, 1) where i < k < I < j.) 
5. By c{i,j) we denote the size of the largest non-crossing set in G{[i,j]). We 
denote c(l,n) by c(S). 

Proposition 1. For every 1 < i < j < n, any largest non-crossing set in 
G{[i,j]) contains all the unit edges of G{[i,j]). 

Proof. Let C be a largest non-crossing set in G([z,j]). Assume that some unit 
edge (p,p-\-l) G E{[i,j]) is not in G; then it must cross some edge in G. Assume 
that it crosses an edge (p, q) G G, where q > p-\- 1 . Then G cannot have any edge 
of the form (p, r) where r > p -\- l,r ^ q, since such an edge crosses (p, q). Also, 
G cannot have any edge of the form (r,p -|- 1) where r < p, since such an edge 
also crosses (p, q). Now G\ {(p, ( 7 )}U{(p,p-|- 1), (p-l- 1, g)} is a larger non-crossing 
set in G([z, j]), a contradiction. Similarly, we can show that (p,p -I- 1) does not 
cross any edge {r,p-\- 1) G G, where r < p. By definition, (p,p-l- 1) does not cross 
any other type of edge. So G must contain (p,p -|- 1). □ 

Corollary 5. If every non-empty sequence of S is a strip, then c(S) equals n — 
#strip(S). In particular, c(M„) = n — 1. 

Lemma 4. mc(S) = m(S) = n — 1 — c(S). 

Proof. By definition, to(S) < mc(S). We will show that these two quantities are 
sandwiched between n — 1 — c(S). We first prove the following claims: 

Claim. Let Si be an instance of MBSM, and let p be a strip or a substrip move 
giving the MBSM instance S 2 . Then c(Si) < 0 ( 82 ) -h 1. 
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Proof of Claim: Let G\ = {V,Ei) and G2 = {¥,£ 2 ) respectively denote the 
digraphs associated with Si and S2. Let p move a (sub)strip a, ... ,b. 

Let Cl be a largest non-crossing set in Gi. Partition Gi as Ai = Ci\ E 2 , 
A 2 = Gi n i?2- Since Gi has all unit edges of Ei, Ai can have at most two 
edges, one of the form {i, a) and one of the form (b,j). If |^i| < 1, then A 2 is a 
non-crossing set in G2 of size at least |Gi| — 1. Otherwise, G2 = A2 U {(z, j)} is a 
non-crossing set in G2 of size |Gi| — 1. Thus in either case 0(82) > c(Si) — 1. □ 

Claim. Let S be an instance of MBSM, and let p be a strip move giving the 
MBSM instance S^ Then c(S^) < c(S) -I- 1. 

Proof of Claim: Suppose p moves the strip x. Then moving x back to its original 
place in S is a strip or substrip move from S to S. By the preceding claim, 
c(S') < c(S) -k 1. □ 

To prove mc(S) < n — 1 — c(S): The proof proceeds by induction on r = 
n — 1 — c(S). The basis case is clear, since r = 0 S = M„. For the inductive 
step, let G be a largest non-crossing set in G, of size rz — 1 — r. 

If G has only unit edges, then it can be argued that every non-empty sequence 
of S must be a strip. So by Corollary 5 and Equation 2, c(S) = |G| = n — 
#strip(S) and mc(S) < #strip(S) — 1 = (n — |G|) — 1 = n — c(S) — 1. 

If G has non-unit edges, find a non-unit edge (z, j) which does not contain any 
other non-unit edge. (Edge (z, j) contains edge {k,l) \i i < k < I < j.) Clearly, 
such an edge must exist. By definition of non-crossing set, (z,z -I- 1) ^ G. Then 
from Proposition 1, (z,z-|-l) ^ E. Also, if fc = last(z -|- 1,S), then G has no edge 
of the form (k,l), since any such edge would be contained in (z, j) if I < j and 
would cross (z, j) otherwise. Perform the canonical strip move “move z -I- 1 to 
predecessor”, giving S' with digraph G'. By the preceding claim, c(S') < c(S)-|-l. 
But G\ {(z, j)} U {(z, z -I- 1), (k,j)} is a non-crossing set in S', of size |G| -I- 1, and 
so c(S') > c(S) -I- 1. Hence c(S') = c(S) -1-1 = rz — 1 — (r — 1). Using the induction 
hypothesis, rzzc(S) <1-1- rzzc(S') < 1 -I- (rz — 1 — c(S')) = rz — c(S) -I- 1. 

To prove rrz(S) > rz — 1 — c(S): Let rrz(S) = rrz and let pi, . . . , be an optimal 
strip-move merging sequence for S. Applying these moves sequentially to S gives 
Si, . . . , Sm = M„. By the above claim, c(Sm) < c(Sm_i) -I- 1 < . . . < c(S) -I- rrz. 
But by Corollary 5, c(Sm) = rz — 1. Hence rrz(S) = rrz > rz — 1 — c(S). □ 



Lemma 5. c(S) can be computed in 0{n^) time. 

Proof Let t{i,j) denote the size of the largest non-crossing set in G([z,j]) that 
uses the edge (z, j), and let q{i,j) denote the size of the largest non-crossing set 
in G([z,j]) that does not use the edge (z,j). We present recurrence relations for 
the c(z,j)’s, t{i,j)’s and <7(z,j)’s, which directly lead to an 0{n^) time dynamic 
programming algorithm for computing c(l,rz). The recurrences are as follows: 



For z G [rz], 
For z G [rz — 1] , 
For l<z<j — 2<rz 



c(z, z) = 0 

i i + l'] = {^ if (z,z -k 1) G A 

’ ^0 otherwise 

c(z, j) = max{t(z, j),g(z, j)} 
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l + c(z + l,j- 1) 
0 



if {i,j) G E 
otherwise 



For I <i < j — 2 <n = 

q{i,j)= maxc(z, fc) + c(fc,j) 

i<k<j 

Let us see why these recurrences hold. 

The recurrence for c{i,j): Follows directly from the definition. 

The recurrence for If {i,j) ^ E, then = 0. Otherwise, if (z,j) is 

in a non-crossing set C, then C cannot have any other edge incident on i or j. 
For C to be a largest such set, C \ must be a largest non-crossing set in 

G([z -I- 1, j — 1]), giving the claimed recurrence. 

The recurrence for q{i,j): Clearly, q{i,j) > ma,Xi^k<j c{i, k) + c{k,j). To see 
the other direction, we show how to find a k such that q{i,j) = c{i, k) + c{k,j). 
Let E'{[i,j]) = E{[i,j]) \ {(z, j)}, and let G be a largest non-crossing subset of 
E'{[i,j]), so that |G| = q{i,j). By definition of crossing edges, G has at most 
one edge incident on z. 

Case 1: Some edge (i,k) incident on z is present in G. Let A = C (1 E{[i,k]) 
and B = C n E{[k,j]). Then A and B are disjoint, and G must be equal to 
AU B. Now A is a non-crossing set in G([z,fc]), and it must be a largest such 
set. Similarly, B must be a largest non-crossing set in G{[k,j]). Thus q{i,j) = 
|G| = \A\ + \B\ = c{i,k) + c{k,j). 

Case 2: In G, z is isolated. Then (z,z -I- 1) ^ E'{[i,j]) and G C E{[i + 1, j]). 
Hence, c(z,z -I- 1) = 0 and c(z, j) = c{i,i + 1) + c(z -I- 1, j). □ 

6 MBSM Can Approximate SBSM 

Any permutation tt can be uniquely decomposed into maximal increasing sub- 
strings. Considering each of these substrings as a sequence gives an instance of 
MBSM. We show that this MBSM instance provides an approximation for s(7t). 

Definition 3. Let S be any non-empty sequenee ai, 02 , ■ ■ ■ , at of distinct ele- 
ments. The set Sg of increasing sequences which are substrings of S is con- 
structed as follows. Let I be the maximum number such that ai < 02 < . . . < 
0 , 1-1 < oi. Lf I = t, then Sg = {S'}. Otherwise, Es = {SijUSs', where S\ is the 
sequence (ui, 02 , ... , a{), and S = Si || S'. 

In particular, when S is a permutation, then Ss is an instance of MBSM. The 
examples below show that m(S.n.) and s(7t) can behave quite differently. Nonethe- 
less, we show in Lemmas 6 and 7 that rrz(S 7 r) is in fact a 2-approximation for 
s(7t). Theorem 2 follows from these lemmas. 

Example 2. 1. Consider the permutation tt on 2rz elements where Ti 2 k-i = k 

and 7T2fc = n-\- k otherwise. Then s(7t) = n — 1 whereas 77 z(St) = 2rz — 2. 

2. For TT = rev„, s(7t) = z7z(S,r) = n — 1. 



Lemma 6. m(S,r) < 2s(7t). 
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Proof. It suffices to show that given any canonical strip move sorting sequence 
for 7T of length I, there is a strip move merging sequence for with at most 21 
moves. Let pi,P2, ■ ■ ■ , Pi be a sequence of canonical strip moves sorting tt. Let 
7T = 7T°, and let result from applying strip move pi+i to tt*. Consider a 
direct simulation of these moves on S^. The first move can always be performed. 
At later stages, a strip being moved in the permutation may not be a strip in 
the corresponding set of sequences but may consist of several fragments. At the 
end too, when tt* = id„, we may have a set of single-strip sequences instead of 
M„, and the strips have to be collected to give M„. 

To handle the first problem described above, we collect the fragments through 
a series of strip moves, and then move the built strip. For the second problem, 
at the end, we move the non-empty single-strip sequences into one sequence. 
Thus we make I moves corresponding to the piS, and some extra moves to collect 
fragments, and some extra moves at the end to merge strips. Every time we col- 
lect more than 2 fragments to form a bigger strip, we empty out some sequence, 
which saves a move in the final merging step. A careful analysis shows that the 
number of extra moves required does not exceed 1. 

Formally, let = {81,82, ■ ■ ■ , 8k) be denoted by Sq. In the first I stages, we 
describe how to construct, for each i < I, an instance S^+i of MBSM from the 
MBSM instance Si. (Si is not necessarily S^i.) Let Si = {81,82, ■ ■ ■ , S'^}, where 
we list the sequences in the same order (that is, for each I < j < k, 8j is the 
sequence obtained from 8{~^ after performing move pf). We will maintain the 
following invariants, which are clearly true for i = 0: 

Vz II II . . . II 81 (3) 

Vz Inc(S*) C Inc(7T*) (4) 

Let pi+i move a strip x of tt* to obtain 7r*+^. We simulate pi+i by performing 
a defragment phase followed by a strip move. If x is already a strip in Si, then 
the defragment phase is empty. Otherwise, due to the invariants we maintain, 
there are indices u < v such that x = (last strip of 81) || || ... || || 

(first strip of 81). The defragment phase consists of at most v — u moves which 
collect these fragments, in the same relative order, into the sequence 8)^. This 
gives the MBSM instance S( in which a; is a strip. Clearly, composing the se- 
quences of S( still gives ttL 

If pi+i moves X to its predecessor, do the same in S( to obtain S^+i; otherwise, 
if pi+i moves x to its successor, do the same to obtain S^+i. (Thus, if pi+i moves 
X to between its predecessor and successor, and these are in different sequences 
of S', then we move x to the sequence containing its predecessor.) It is clear that 
this move maintains the invariants of Equations 3 and 4. 

Since 8\ || S '2 || . . . || 8{ = tt’’ = id„, each non-empty sequence in S; must be 
a strip. In the last {I + l)th stage, we repeatedly move the strip containing 1 to 
its successor strip until M„ is obtained. 

It remains to analyze the length of this merging sequence. We bound this 
by 21 by choosing a suitable potential function and performing an amortized 
analysis of each stage. 
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Let Ci denote the cost of the ith phase. For 1 < t is of the form 

1 + ti, where ti is the number of moves required in the defragment phase of the 
ith stage. And c;+i = #strip(S*) — 1. Further, let Pi and Qi denote the sets 
Inc(7T*) and Inc(S*), with sizes Pi,Qi, respectively. From Equation 4, pi > qi. 
We define the potential function (j)i to be Pi — Qi- Note that 4>o = 4>i+i = Oj and 
(f)i = #strip(S*) — 1. Now, the amortized cost of the ith stage, Oj, is given by 
cLi = Ci + 4>i — 4>i-i- Since </>o = (j>i+i = 0, the total real cost Ci equals 
the total amortized cost Clearly, o;+i = 0. We now show that all the 

remaining phases have an amortized cost of at most 2. That is, for 1 < i < i, 
a* < 2. 

Let Pi move a strip x with f irst(x, 7r*“^) = b and last(a;, 7r*“^) = c, to 
between elements e and /. One of e and / could be undefined, if the move is to 
an extreme end. Let us assume that e is defined, the other case is similar. Let 
prev(6, 7T*“^) = a if defined, let next(c, = d if defined. 

The ti defragment phase moves made here each add an element to Qi . These 
added elements are all adjacencies, are already present in Pi-i, continue to be 
present in Pi, and are not present in Qi-\. Thus there is a contribution of —ti 
to <j)i 1 ■ 

After the defragment phase, each of the points {a,c,e} contributes — 1 or 0 
or +1 to 4)i — 4>i-i- No other points contribute. For an x G {a, c, e} to contribute 
+1, it must hold that x £ Pi\Qi (and that x G Qi-i or x ^ Pi-i)- 

If e £ Pi \ Qi, then the canonical move must have placed c to the left of 
/ = c+ l. Soe<6<c</, but e and b are in different sequences. Thus 
e G Pi -1 \ Qi-i- Hence e cannot ever contribute a +1. 

If c G Pi \ Qi, then the canonical move must have placed b to the right 
ofe = 6— 1. Soe<&<c</, but c and / are in different sequences. So 
e G Pi -1 \ Qi-i and e G Qi- Thus e contributes a —1, negating a +1 possibly 
contributed by c. 

It follows that c and e together contribute at most 0. Since a contributes at 
most +1, the net contribution from {a,c,e} together is at most +1. 

Thus at the ith stage, the total increase in potential (j>i — (j)i-i is at most —ti 
from the defragment phase plus 1 from the last move. That is, 4>i — 4>i-i < 1 — ti. 
Hence ai = Ci + 4>i — 4>i-i < 1 + U + 1 — ti = 2. □ 



Lemma 7. s(7t) < m(S,n.). 

Proof. Consider any sequence of strip moves merging S.^-- We perform these 
moves directly on tt. We maintain at each step the invariant that composing the 
sequences of gives tt*. Thus a strip being moved in the sorting sequence for 
S,„. is always a strip or a substrip in the corresponding permutation. (Why would 
it ever be a substrip? As in the proof of the previous Lemma, strips of tt could 
merge because intervening substrings move out, but they will not merge in S 
if they belonged to different sequences to begin with.) So we can conclude that 
s'(7t) < m(S 7 r). The result now follows from Lemma 1. □ 
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7 Conclusion 

Our main contribution has been to present a variant, SBSM, of the well-studied 
sorting by transposition problem SBT, and to describe a polynomial time 2- 
approximation algorithm for SBSM. 

There are several open questions. The most important ones are: (1) What 
is the complexity of SBSM? (Note that this question is still open for SBT as 
well.) and (2) How do strip moves relate to transpositions? - Does SBT reduce 
to SBSM? Does s(7t) approximate t(7r) to any factor better than 3? 
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Abstract. Every composition of macro tree transducers that is a func- 
tion of linear size increase can be realized by just one macro tree trans- 
ducer. For a given composition it is decidable whether or not it is of 
linear size increase; if it is, then an equivalent macro tree transducer can 
be constructed effectively. 



1 Introduction 

The macro tree transducer (mtt) [EV85,FV98] is a well-known model for syntax- 
directed translation. Roughly speaking, mtts are primitive recursive programs 
(with parameters) which take trees as input and compute trees as output. Due 
to the advent of XML there is a new interest in tree transducers [ViaOl] and 
particularly in the mtt [EM03a,PS] . As will be explained below, the result of this 
paper provides a technique for optimizing XML queries and transformations (as 
they appear in present day implementations). 

Consider the sequential composition of (total deterministic) mtts. It gives 
rise to a hierarchy which is strict at each level. This means that {n + l)-fold 
compositions of mtts can do more translations than n-fold compositions. In this 
paper it is proved that if we restrict the translations to linear size increase (i.e., 
tree functions for which the size of each output tree is linearly bounded by the 
size of the corresponding input tree) then composition of mtts does not give rise 
to a proper hierarchy: Every composition of mtts that is of linear size increase 
can be realized by just one mtt. The second main result is that, for a composition 
of mtts, it is decidable whether or not it is of linear size increase. In case it is, 
an equivalent mtt can be constructed effectively. 

Intuitively, the decidability result means that for compositions of mtts we 
can tell the “good” translations (viz. those of linear size increase) from the bad 
ones. Note that the good tree translations can be computed (on a RAM) in linear 
time (in the size of the input tree), due to the fact that every composition of 
mtts can be computed in time linear in the sum of the sizes of the corresponding 
input and output trees [Man02]. The “bad” translations can of course not be 
computed in linear time. 

In terms of XML processing our decidability result implies an algorithm for 
automatic query optimization which can be applied to the known XML query 
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and transformation languages. Examples of such languages are the extensible 
Stylesheet Language XSL (now called XSLT; see http://www.w3.org/Style/XSL 
and [BMN02] for a formalization in terms of tree transducers), XQuery (see 
http://www.w3.org/XML/Query) and fxt [BS02]. A formal model for the “tree 
transformation core” of all these languages is the pebble tree transducer which 
was introduced in [MSV03]. In [EM03a] it was shown that pebble tree transduc- 
ers can be simulated by compositions of mtts. 

Many real world XML transformations are rather simple and are in particular 
of linear size increase. Our result implies that these transformations, no matter 
how badly programmed (i.e., by using many intermediate results), can be auto- 
matically transformed into highly efficient translations that are computable in 
linear time. In fact, the algorithm finds an equivalent linear time transformation, 
if and only if it exists. 

Besides low complexity, the class of good tree translations, viz. the class of 
macro tree translations of linear size increase, has many more attractive proper- 
ties. For instance, it can be characterized in several natural ways: In [EM03b] it 
was proved that it equals the class of finite copying macro tree translations, which 
is equal to the class of translations realized by single use restricted attributed 
tree transducers (with look-ahead) [EM99] of Giegerich and Ganzinger [Gie88, 
GG84], which is equal to the class of tree translations definable in monadic 
second-order (MSO) logic [BEOO] of Gourcelle, cf. [Gou94]. Another attractive 
property of this class is its closure under composition (first proved in [Gie88]; 
see also [Kiih97] for a proof in the setting of tree transducers). 

Let us now consider the proof of the first main result. In [EM03b] it was 
proved that the restriction of macro tree translations to linear size increase 
equals the class of MSO definable tree translations. It was mentioned in the 
conclusions of that paper as an open problem, whether or not this result could 
be extended to compositions of mtts. At that time it was not clear whether such 
an extension would hold at all, not even when considering the compositions of 
just two mtts. It seemed that a similar sequence of involved pumping arguments 
as used in [EM03b] would be needed, and therefore we had given up hope for 
the proof of an extension. 

The surprising news of this paper is that no pumping arguments are needed 
whatsoever. Rather, the result is a consequence of the results of [Man02]: Gon- 
sider a composition of mtts. The result of [Man02] says that we can change these 
transducers in such a way that no garbage ever occurs in intermediate trees, i.e., 
each symbol in the output tree of a transducer in the composition is needed for 
the computation of the final output tree. The “garbage-free” tree transducers, 
except the first, are of linear hounded input: each output tree is at most linearly 
smaller than the corresponding input tree. Moreover, as also shown in [Man02], 
they can be assumed to be linear, in the usual sense that they do not copy the 
input tree. Now consider a composition of mtts that is of linear size increase. By 
the result of [Man02] we can change the transducers in such a way that the size 
of each intermediate result is linearly bounded by the size of the final output 
tree. Then, roughly speaking, we can construct one new mtt which computes all 
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intermediate results “on the fly”. In fact, it is easy to show that if a composition 
/ o ^ is of linear size increase, and g is of linear bounded input, then also / must 
be of linear size increase. This implies that the first mtt of the composition must 
be of linear size increase, and hence realizes an MSO definable tree translation 
by the results of [EM03b]. This translation can now be composed with the next 
mtt of the composition (due to its linearity) into a new mtt, etcetera. Thus we 
obtain the desired result: the restriction of compositions of mtts to linear size 
increase gives precisely the class of MSO definable tree translations. 

We now shortly discuss the proof of the decidability result. In [EM03b] it is 
proved that it is decidable for an mtt whether or not its translation is of linear 
size increase. This result can now be extended to compositions of mtts: Let r 
be an n-fold composition of mtts. We sketch the decision procedure for linear 
size increase: First, decompose t into an mtt M followed by n — 1 mtts that are 
linear and of linear bounded input. Now check whether or not M is of linear size 
increase. If it is not then we are finished and know that r is not of linear size 
increase. If M is of linear size increase, then we can construct an equivalent single 
use restricted attributed tree transducer. This transducer can now be composed 
with the first of the n — 1 mtts into a new mtt. Hence, we have reduced the 
checking of linear size increase from n-fold to (n — l)-fold compositions of mtts. 

2 Preliminaries 

Unless mentioned otherwise, a function f : A ^ B is always total, i.e., its 
domain is A. For functions f : A ^ B and g : B ^ C their composition is 
{f og){x) = g{f{x)); note that the order is nonstandard. For sets of functions F 
and G their composition is FoG = {f og\f£ F, gG G}, and F" = F o ■ ■ ■ o F 
(n times, n > 1). Furthermore, F* denotes the union Un>i 

We assume the reader to be familiar with trees, tree automata, and tree 
translations (see, e.g., [GS97]). A set S together with a mapping rank: A — >• N 
is called a ranked set. For fc > 0, is the set {a G F \ rank((r) = k}-, 
we also write to denote that rank(cr) = fc. If A = 27^°) U then F is 
monadic. For a set A, (F,A) is the ranked set {(o', a) \ a G F, a G A} with 
rank(((T, a)) = rank((r). The set of all (ordered, ranked) trees over F is denoted 
Ts- The size of a tree t, i.e., its number of nodes, is denoted |t|. As an example, 
the tree a{a, a{a, b)) G Ts with a G F^^'> and a,b G 27^°^ has size 5. For a set A, 
Ts (A) is the set of all trees over FU A, where all elements in A have rank zero. 
We fix the set of input variables as X = {xi,X2, . • . } and the set of parameters 
as F = {yi,y 2 ,...}. For fc > 0, = {xi,...,Xk} and Yfc = {yi,...,yk}. 

Linear Size Increase and Linear Bounded Input. Let A and B be sets 
such that each element doi AiJB has associated with it a size, which is a natural 
number denoted by \d\. A function / : A — >■ H is of 

— linear size increase if 3c G N Va G A : |/(a)| < c|a| and 

— linear bounded input if 3c G N Va G A : |a| < c|/(a)|. 
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The corresponding classes of functions are denoted by LSI and LBI, respectively. 
Note that (trivially) these classes are closed under composition. 

Lemma 1. Let A, B, C be sets and f : A ^ B and g : B ^ C functions. 

If (1) f o g G LSI and (2) g G LBI, then / G LSI. 

Proof. By (2), there is a c > 0 such that V& G B,\b\ < c\g{b)\. Hence, in 
particular, Va G A,\f{a)\ < c\g{f{a))\. By (1), there is a c' > 0 such that 
Va G A,\g{f{a))\ < c'|a|. Hence, Va G H, |/(a)| < cc'|a|, which means that 
/ G LSI. □ 

Lemma 2. Let A, B, C be sets and f : A ^ B and g : B ^ C functions. 

If (I) f o g G LSI, (2) / G LBI, and (3) / is surjective, then g G LSI. 

Proof. Let c be the constant in (I) and let c' be the constant in (2). By (3), 
Vb G B 3a G A : f{a) = b. Hence, for all b = /(a) G B,\g{b)\ = | 5 (/(o))| < 
c|a| < cc'|/(a)| = cc'l&l, which means that g G LSI. □ 

3 Tree Transducers 

In this section, macro tree transducers (mtts) are defined, two examples of them 
are given, and a number of results about them are cited which are needed to 
prove our main result. Furthermore, we prove in Lemma 5 a new result about 
linear mtts. 

Definition 1. A (total, deterministic) macro tree transducer (mtt) is a tuple 
M = {Q,B,A,qQ,R), where Q is a ranked alphabet of states, B and A are 
ranked alphabets of input and output symbols, respectively, qo G is the 

initial state, and i? is a finite set of rules. For every q G and a G with 
m,k > 0 there is exactly one rule of the form {q, a{x\, . . . , Xk)){yi, ■ . ■ , ym) C 

in R, where f G T^Q,Xk)uA('Kn); the tree f is denoted by rhsM(g, o')- □ 

The rules of M are used as term rewriting rules in the usual way. The deriva- 
tion relation of M (on Ti^q^Ts)ua) is denoted by =^>m and the translation realized 
by M, denoted tm, is the (total) function {(s, t) G y. Ta \ {qo,s) t}. The 
class of all translations that can be realized by mtts is denoted by MTT. The mtt 
M is linear, if the right-hand side of every rule is linear in the input variables Xk. 
The corresponding class of translations is denoted by LMTT. A (total, deter- 
ministic) top-down tree transducer is an mtt all states of which are of rank zero. 
The class of all translations that can be realized by top-down tree transducers 
is denoted by T, and by LT for linear top-down tree transducers. 

Example 1. (1) Let A be a ranked alphabet. We now define the mtt Ms which 
translates a tree s into a monadic tree t such that t contains s in prefix (Polish) 
notation; e.g., cr(a, cr(cr(b, c), d)) is translated into a(a(cr(cr(b(c(d(e))))))). 




330 



S. Maneth 



Let Ms = (Q, IJ, L:', qo, R) with Q = I ^ e i7}U{e(o)} 

where e is a new symbol not in S. For every a G k >0 the following rules 

are in R. 

{qo, cr(xi, . . . , Xk)) <j{{q, Xi){{q, X2){- ■ ■ {q, Xk){e) •••))) 

{q, , Xk)){yi) <j{{q, Xi){{q, X2){- ■ ■ {q, Xk){yi) •••))) 

Let S = , 6*-°^ } and s = a{a,b). Then 

{qo,s) <x{{q,a){{q,b){e))) <x{a{{q,b){e))) a{a{b{e))). 

Clearly, |tme(s)| = |s| + 1. Thus, Ms is of linear size increase, i.e., tms G LSI. 
Note also that Ms is linear. 

(2) Let Z\ be a monadic ranked alphabet. The mtt N/^ translates a monadic 
tree s of size n into the monadic tree (e) of size (and height) 2"“^ + 1. 
Define = (Q,A,r,qo,R) with Q = and R = For 

every 6 G and Q, g /\(o) following rules be in R. 

{qo,S{xi)) {q,xi){{q,xi){e)) 

(go, a) a{e) 

{q,S{xi)){yi) {q,Xi){{q,Xi){yi)) 

{q,a){yi) a{yi) 

Consider A = R and s = aae. Then (qo,aae) ( 9 , oe)((g, ae)(e)) 

( 9 ,e)((g,e)((g,ae)(e))) a(a((g, ae)(e))) a(a((g, e)((g, e)(e)))) 

a(a(a(a(e)))). 

Linear Macro Tree Transducers. A main idea of the complexity proof given 
in [Man02] is the use of the class LMTT of linear macro tree translations. This 
class is nicely situated in the world of tree transducers: it is powerful enough so 
that its composition closure equals that of mtts (because it can simulate tree 
homomorphisms), but it is weak enough in order to be simulated by attributed 
tree transducers (with look-ahead). Both of these results are new, but only the 
latter is needed for this paper. 

The attributed tree transducer (att) [FiilSl] is essentially the attribute gram- 
mar of Knuth [Knu68], but restricted to the generation of trees as output (and 
with tree top-concatenation as only operation). The corresponding class of tree 
translations is denoted ATT. 

In general, mtts have much nicer closure properties than atts (cf. [EM02]). 
However, with respect to pre-composition with the class MSOTT of MSO de- 
finable tree translations the situation is the converse: ATT is closed and MTT 
is not. To be precise, the atts must be equipped with a look-ahead facility. As 
argued in [BEOO], the corresponding class ATT^ is more natural and robust 
than ATT. In [EM03b] it was proved that this look-ahead can be simulated by 
a (total deterministic) bottom-up relabeling followed by a (total deterministic) 
top-down relabeling, or, by a top-down relabeling with look-ahead: 

ATT^ = B-REL o T-REL o ATT = T-REL’^ o ATT. 
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The following lemma is proved in Proposition 2 and Theorem 17 of [BEOO]. 
Lemma 3. MSOTT o ATT’^ = ATT^. 

Let us now show that indeed MSOTToMTT % MTT (note that the addition 
of look-ahead to mtts does not change the situation: mtts are closed under pre- 
composition with both, B-REL and T-REL). This can be done using the mtts 
of Example 1. Let 27 = and let Ms = ,qo,R) be the mtt 

defined in Example 1. Let Sn be a full binary input tree of height n. Then tms (s) 
has height 2". The translation tms is MSO definable because Ms is of linear size 
increase (see Lemma 4). Now, let Ns' be the mtt defined in Example 1. Since 
Ns' has exponential size increase, tms ° has double exponential height 
increase, and therefore cannot be realized by any mtt (Theorem 3.24 of [EV85]). 

As mentioned in the Introduction, MSOTT has various characterizations: By 
Theorem 7.2 of [EM03b] it is equal to the restriction of MTT to linear size in- 
crease. By Theorem 17 of [BEOO] it equals the class ATT^,. of single use restricted 
atts with look-ahead. This restriction is known from attribute grammars [Gie88, 
GG84] and requires that each attribute may only be used at most once, in the 
set of attribute rules for one input symbol (the input symbol corresponds in an 
attribute grammar to a production of the context-free grammar; in this sense 
our ranked trees can be seen as abstract syntax trees) . 

Lemma 4. MSOTT = MTT 0 LSI = ATTf^,. 

Note that Lemmas 3 and 4 together imply that ATT^,. o ATT^ = ATT^. 
Direct constructions for this result (cum grano sails) can be found in [Gie88, 
Kiih97]. We now use this fact to prove the main result of this section, namely, 
that linear mtts can be simulated by atts with look-ahead. 

Lemma 5. LMTT C ATT^. 

Proof. By Lemma 4.34(2) of [FV98], every linear mtt can be decomposed into 
a linear top-down tree transducer followed by a “YIELD-mapping” . The latter 
interprets trees in an algebra of tree substitutions. It is well known that YIELD- 
mappings can be realized by atts (see, e.g., Gorollary 6.24 of [FV98]). Hence, 
LMTT C LT o ATT. Unfortunately, linear top-down tree transducers are not 
single use restricted (when seen as atts, or using the definition of [Kiih97]). 
However, a slightly more general notion of single use for mtts (and top-down tree 
transducers) is given in Definition 5.7 of [EM99]; below that definition it is noted 
that LT is indeed included in the class of (generalized) single use restricted top- 
down tree transducers. Moreover, every such restricted top-down tree transducer 
can be simulated by a top-down relabeling (with look-ahead) followed by a top- 
down tree transducer which fulfills the conventional definition of single use (see 
Definition 5.5 of [EM99]), and which therefore can be simulated by a single use 
restricted att. Hence, LT is included in T-REL^ o ATTgur which equals ATT^,.. 
We obtain LT o ATT C ATT^,. o ATT. By Lemmas 3 and 4 the latter is included 
in ATT^. □ 




332 



S. Maneth 



Decidability. In Theorem 7.3 of [EM03b] it is proved that for an mtt (with look- 
ahead) it is decidable whether or not it is of linear size increase. The proof uses 
a normal form for which it can be shown that an mtt in normal form is of linear 
size increase if and only if it is finite copying. Finite copying means that every 
node of the input tree is translated only by a bounded number of states (called 
“finite copying in the input”), and that each parameter of a state is only copied 
a bounded number of times (called “finite copying in the parameters”). For an 
mtt M it is decidable whether or not it is finite copying. Roughly speaking, this 
is done as follows. Change M in such a way that it has a new input symbol $ of 
rank zero, and blocks at a node u if it is labeled by $ (by outputting its state); the 
corresponding output tree contains all “state calls” on u. Then a second mtt can 
be used to fish out all these state calls, and put them into a monadic tree. Then, 
the height of this output tree equals the number of states that are translating 
the node u of the input tree s. Hence, the range of this transducer is finite if 
and only if M is finite copying in the input. Finiteness of ranges of compositions 
of mtts is decidable by [DE98] . Deciding finite copying in the parameters works 
similarly. 

Lemma 6. (Theorem 7.3 of [EM03b]) It is decidable for a macro tree transducer 
whether or not it is of linear size increase; if it is then an equivalent finite copying 
mtt can be constructed effectively. 



Linear Bounded Input. As mentioned in the Introduction, every composition 
of mtts can be made “garbage-free” . In [Man02] , the garbage- free transducers are 
called “productive” (indicated by subscript “prod”). In particular. Theorem 12 
of [Man02] states that 

MTT”+i C B-REL o LT o Tp^od o LMTT”+^. 

Now B-REL oLToTprodoLMTT C MTT because MTT is closed under left com- 
position with top-down tree transducers by Corollary 4.10 of [EV85], and under 
regular look-ahead (= left-composition with the class B-REL) by Theorem 4.21 
of [EV85]. We obtain MTT”+^ C MTT o LMTT((,„d • By Lemma 11 of [Man02] 
the productive mtts are of linear bounded input: LMTTprod C LBI. We obtain 
the following lemma. 

Lemma 7. For every n > 0, MTT"“''^ C MTT o (LMTT fl LBI)". 

4 Main Results 

Here our main results are proved: First the collapse of the mtt hierarchy and 
then the decidability result. Finally we prove that also the nondeterministic mtt 
hierarchy collapses, if restricted to functions of linear size increase. 

Theorem 1. The mtt hierarchy collapses for functions of linear size increase: 
MTT* n LSI C MTT. 
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Proof. Let r G MTT"+^ n LSI with n > 1. By Lemma 7, MTT”+^ C 
MTT o (LMTT flLBI)”. As mentioned in the Preliminaries, LBI is closed under 
composition. Thus, (LMTT iT LBI)" C LMTT" fl LBI and we obtain 

MTT"+i C MTT o (LMTT" n LBI). 

This means that there are functions / and g such that t = f o g, f ^ MTT, 
and g G MTT" fl LBI. Since r = / o ^ G LSI, it follows from Lemma 1 that 
/ G LSI. Hence, by Lemma 4, / G MSOTT. Moreover, by Lemmas 3 and 5, 
MSOTToLMTT C ATT^. It follows from Theorem 4.4 of [EM99] that ATT’^ C 
B-RELoToATT and from Lemma 5.11 of that paper (and the closure properties 
of MTT mentioned at the end of Section 3) it follows that ATT C MTT. Hence, 
ATT^ C B-REL o T o MTT C MTT. Altogether, we obtain that 

MTT"+i n LSI C MTT o (LMTT" n LBI) n LSI (by Lemma 7) 

C (MTT n LSI) o LMTT" n LSI (by Lemma 1) 

C MSOTT o ATT^ o MTT""^ 0 LSI (by Lemmas 4 and 5) 

C ATT^ o MTT""^ n LSI (by Lemma 3) 

C MTT" n LSI (by [EM99]) 

C MTT. (by induction) □ 

Theorem 2. For a composition of mtts it is decidable whether or not it is of 
linear size increase; if it is, then an equivalent mtt can be constructed effectively. 

Proof. Let n > 1 and Mi , . . . , M„ be mtts. Following Lemma 7 we first construct 
an mtt A^i and linear mtts N2, . . . , N„ such that tatj o • • • o tn^ = tmi o • • • o tm„ , 
and N2, . . . , Nn are of linear bounded input. 

By Lemma 6 there is an algorithm to decide linear size increase of an mtt. 
It works by first applying normalize to obtain an equivalent mtt in normal 
form and then by determining (using fc) if the resulting mtt is finite copying. 
Furthermore, there is an algorithm compose which takes two mtts M, M' , where 
M is finite copying and M' is linear and returns a new mtt N which realizes the 
composition This can be done by first constructing an equivalent single 

use restricted mtt N' and then an equivalent single use restricted att with look- 
ahead Ai, following the constructions presented in [EM99]. Then construct for 
N 2 an equivalent att with look-ahead A 2 . This can be done following the proof 
of Lemma 5; in fact, it should be straightforward to design a direct construction 
for this composition task. Now compose the two atts Ai and A 2 ; this can be 
done following the construction of Ganzinger and Giegerich [Gie88,GG84] (see 
also [Kiih97] and Lemmas 3 and 4). Finally, turn the resulting att into an mtt 
N, as described in the proof of Theorem 1. 

Based on normalize, fc, and compose we are now ready to present the 
algorithm Isi which takes as input the mtts Ni, . . . ,Nn of above and returns 
an equivalent mtt if the composition o • • • o is of linear size increase, and 
otherwise returns an error message. It follows from Lemma 1 that if the mtt 
M (obtained by composing the first i transducers) is not of linear size increase, 
then the whole composition is not (because the remainder A^i+i o • • • o N„, is of 
linear bounded input). 
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lsi( n, ( N_l,..,N_n ) ) 

{ 

i:= 1; M:= N_l; 

while ( i<n and fc( M:= normalize ( M ) ) ) 

{ 

i:= i+1; M:= composeC M, N_i ); 

} 

if ( i=n and fc( M:= normalizeC M ) ) ) then returnC M ); 

else exceptionC '‘error: not Isi at 7,1’ i ); 



Nondeterministic Translations. In this subsection we want to show that even 
the nondeterministic mtt hierarchy collapses when restricted to (total) functions 
of linear size increase. The result is a direct consequence of the fact that restrict- 
ing n-fold compositions of nondeterministic macro tree translations to functions 
yields precisely the n-fold compositions of total deterministic macro tree trans- 
lations (in Theorem 3). 

Let us consider a nondeterministic mtt M which has states q, q' of rank one 
and zero, respectively. Furthermore, let a be an input symbol of rank zero and 
let M contain the three rules {q,a){yi) — >■ a{yi,yi), {q' ,a) — >■ a, and (q',a) — >■ b. 
Consider the sentential form ^ = {q,a){{q' ,a)). Since M is nondeterministic, 
it makes a difference in which order ^ is evaluated: if we evaluate from inside- 
out (10, or “eager”, or call-by- value) then first a (g',a)-rule is applied yielding 
(q,a){a) or (q,a){b). Application of the (g, a)-rule gives a(a,a) or a{b,b). On 
the other hand, if we evaluate outside in (01, or “lazy”, or call-by-name) then 
we obtain (j{{q', a), {q' , a)) and finally all four trees a{x, y) with x,y € {a, b}. 

For nondeterministic mtts it turns out that restricting the order of derivation 
to 01 is the same as putting no restriction (Corollary 3.13 of [EV85]). Moreover, 
this class N-MTTqi is incomparable to the class N-MTTjo of nondeterministic 
macro tree translations with 10 order of derivation. Here, we want to consider 
the class N-MTT of all nondeterministic macro tree translations; we define: 

N-MTT := N-MTToi U N-MTTjq. 

In Corollary 6.12 and Lemma 5.5 of [EV85], respectively, it is proved that 

(Dl) N-MTToi = MTT o N-T and (D2) N-MTTio C N-T o MTT, 

where N-T denotes the class of translations realized by nondeterministic top- 
down tree transducers. By the Lemma of [Eng78] : (F) it is possible to construct 
for any nondeterministic top-down tree transducer T a “deterministic cut”, i.e., 
a deterministic top-down tree transducer with look-ahead dc(T) such that dc(T) 
and T have the same domain and t^c(t) Q (note that the r’s are relations 
now). 

Based on Dl, D2, and F, we now prove the desired result about the restric- 
tion to functions of the nondeterministic mtt hierarchy. Let 77" denote the class 
of all (total) functions. 
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Theorem 3. For every n > 1, N-MTT" fl TT C MTT”. 

Proof. Let r G N-MTT” fl TT". By D1 and D2 we can decompose r into r = 
tti otmi ■ ’OTTan.! otm„ j where Ti is a nondeterministic top-down tree 
transducer and Mi a (total deterministic) mtt. We may assume that each Ti only 
generates trees in the domain of the remaining translation that follows it (because 
that domain is regular, by Theorem 7.4 of [EV85]). Thus, since r is a function, 
it should be clear that r = Tdc(Ti) ° tmi ° Tdc(T 2 ) o • • • o Tdc(T 2 „_i) ° o Tdc(T 2 „). 
where dc(Ti) is constructed from Ti according to F. Thus, t is now represented 
by deterministic transducers only. 

Since t is total, we can make the transducers dc(T^) total by adding dummy 
rules. Thus Tdc(Ti) G T^, the class of translations realized by total deterministic 
top-down tree transducers with look-ahead. Hence we obtain that r € (T^ o 
MTT oT^)”. This class equals MTT" because MTT is closed under left- and 
right-composition with T^ by Corollary 4.10 and Theorem 4.21 of [EV85] and 
by Corollary 12 of [EM02], respectively. □ 

Since the restriction to the class LSI of functions of linear size increase can be 
obtained by first restricting to functions and then restricting to LSI, we obtain 
the following corollary from Theorems 3 and 1. 

Corollary 1. The nondeterministic mtt hierarchy collapses for functions of lin- 
ear size increase: N-MTT* fl LSI C MTT. 

5 XML Translations 

In this section we want to consider our results from the point of view of trans- 
lating XML documents. XML documents naturally correspond to (ordered) 
unranked trees. In contrast to that, our tree transducers operate on ranked 
trees. There are two ways of dealing with this problem: either to generalize the 
existing notions of tree transducers to the case of unranked trees, or to work on 
ranked encodings of unranked trees. The fc-pebble tree transducer of [MSV03] 
is an example of the second choice: even though their model is specifically de- 
signed to model XML transformations, they do work on ranked encodings of 
unranked trees. The most common way of coding unranked trees (also used, 
e.g., in [MSV03]) is the following. Given an unranked tree (seen as a graph, with 
two sorts of edges: child edges and sibling edges), a binary encoding is obtained 
by simply deleting all child edges to non- first children. Similarly, to decode a bi- 
nary tree, the edges to non-first children are added. Obviously, (S) the encoding 
is surjective (in fact, it is a bijection). 

Examples of the first choice are the top-down unranked tree transducer 
of [MN00,Toz01], the XSLTq programs of [BMN02], or the macro forest trans- 
ducers of [PS]. It is a good indication for the robustness of a class of tree trans- 
formations, if it is invariant to going from ranked to unranked trees, i.e., if the 
decodings of the ranked translations coincide with the translations realized by 
the unranked model. For top-down tree transducers this is not the case [MNOO]: 
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the unranked top-down tree transducers are strictly more powerful than the en- 
codings of the ranked ones. Also for mtts this is not the case: as shown in [PS], 
the macro forest transducer is strictly more powerful than the mtt working on en- 
codings of ranked trees. Hence, ENCoMTToDEC C MET. They show moreover 
that it is possible to simulate every macro forest transducer by the composition 
of two mtts (on the encodings): MET C ENC o MTT^ o DEC. 

Let us now consider a formalism that is invariant to the change from ranked 
to unranked trees: the regular tree languages. A convenient way of proving this 
is by the fact that the MSO definable (ranked/unranked) tree languages are the 
regular (ranked/unranked) tree languages. It is straightforward to show that the 
encoding and decoding described above, can be realized by the MSO transducer 
of [Cou94] (this is discussed in [MNOO]). Since the inverses of MSO definable 
translations preserve MSO definability, this immediately proves the invariance 
of regular tree languages to changing the rankedness. For the same reason, the 
MSO definable tree translations are invariant to changing the rankedness. 

In what follows we want to show that the restriction of mtts to linear size 
increase is invariant to changing the rankedness. More precisely, the restriction 
to linear size increase of the composition closure of the class MET of (total 
deterministic) macro forest translations [PS] gives precisely the class MSOFF of 
MSO definable (unranked) forest translations. 



Theorem 4. The mft hierarchy collapses for functions of linear size increase: 
MET* n LSI = MSOFF C MFT. 



Proof. MFT* n LSI 

C (ENC o MTT^ o DEC)* O LSI 
= ENC o MTT* o DEC O LSI 
C ENC o MTT* n LSI o DEC 
C ENC o (MTT* n LSI) o DEC 
C ENC o MSOTT o DEC 
= MSOFF. 



(Theorem 9 of [PS]) 

(because DEC o ENC is the identity) 
(by Lemma I) 

(by S and Lemma 2) 

(by Theorem I and Lemma 4) 



Since MSOFF C (ENC o MTT o DEC) O LSI C MFT O LSI we get equality. □ 



Note that also the nondeterministic macro forest hierarchy collapses. In fact, 
the proof (in the 01 case) is the same as the one above, only with Theorem 1 
replaced by Theorem 3. 

Obviously, since the composition closure of MFT equals ENCoMTT* oDEC, 
it follows from Theorem 2 that also for compositions of MFT it is decidable 
whether or not they are of linear size increase. In Theorem 38 of [EM03a] it 
is proved that also the composition closure of (total deterministic) pebble tree 
transducers equals that of mtts. Since translations in MTTflLSI can be computed 
in linear time [Man02] we can formulate our decidability result as follows. 

Corollary 2. For a composition of macro forest transducers or of pebble tree 
transducers it is decidable whether or not it is of linear size increase, and if it is 
then an equivalent linear time computable mtt (working on binary encodings) 
can be constructed effectively. 

Thanks. To Joost Engelfriet and the referees for helpful comments. 
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Abstract. We propose a notion of distributed games as a framework to 
formalize and solve distributed synthesis problems. In general the prob- 
lem of solving distributed games is undecidable. We give two theorems 
allowing to simplify, and possibly to solve, some distributed games. We 
show how several approaches to distributed synthesis found in the liter- 
ature can be formalized and solved in the proposed setting. 



1 Introduction 

Consider a system consisting of a process, an environment and possible ways 
of interaction between them. The synthesis problem is stated as follows: given a 
specification S, find a finite state program P for the process such that the overall 
behaviour of the system satisfies S, no matter how the environment behaves. 

In a distributed system, in general, one can have multiple processes. The 
system specifies possible interactions between the processes and the environment 
and also the interactions among the processes themselves. The synthesis problem 
here is to find a program for each of the processes such that the overall behaviour 
of the system satisfies a given specification. We call this distributed synthesis 
problem (DSP). 

In this paper we study DSP in a setting where there is a fixed set of processes 
with no interaction among themselves; they interact only with the environment. 
Thus, any communication among processes in the system must be channeled 
through the environment. This is the typical scenario, for example, in any com- 
munication network where processes are peer protocols and the environment is 
the stack of lower layers (including the communication medium) below them. 
Typical metaphors of communication and synchronization like channels, ren- 
dezvous, handshakes can be easily presented in our model. 

Earlier approaches. The distributed synthesis problem has been considered by 
Pnueli and Rosner in the setting of an architecture with fixed channels of com- 
munication among processes [29] . They have shown that distributed synthesis is 
undecidable for most classes of architectures. They have obtained decidability 
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for a special class of hierarchical architectures called pipelines. It must be noted 
that the basic undecidability and lower bounds (in case of decidability) follow 
from much earlier results on the multi-player games of Peterson and Reif [27]. 
After the work of Pnueli and Rosner, the decidability results have been extended 
to branching time specifications over two-way pipelines and one-way rings [13]. 
These are essentially the only architectures for which the problem is decidable. 
Madhusudan and Thiagarajan [17] considered local specifications, i.e., a conjunc- 
tion of specifications for each of the processes. For such specifications the class of 
decidable architectures is slightly larger and includes doubly-flanked pipelines. 

The other approach to distributed synthesis, initiated roughly at the same 
time as the work of Pnueli and Rosner, comes from control theory of discrete 
event systems [30,15,32]. The system is given as a plant (deterministic transi- 
tion system) and the distributed synthesis problem is to synthesize a number 
of controllers, each being able to observe and control only a specific subset of 
actions of the plant. While the original problem refers only to safety properties, 
an extension to the ^-calculus specifications has also been considered [4] . Except 
for some special cases, the problem turns out to be undecidable ([4,34]). It is one 
of the important goals of the area of decentralized control synthesis to identify 
conditions on a plant and a specification such that DSP is decidable. 

A different approach was suggested in [18]. The authors consider a setting 
were processes communicate via handshaking, i.e., common actions. This setting 
can easily encode undecidable architectures from Pnueli and Rosner setting so 
the synthesis problem, even for local specifications, is undecidable. To get decid- 
ability results the authors propose to restrict the class of allowed controllers. 

Our approach. Game theory provides an approach to solving the (centralized) 
synthesis problem. An interaction of a process with its environment can be 
viewed as a game between two players [1,28,33]. Then the synthesis problem 
reduces to finding a finite-state winning strategy for the process. The winning 
strategy can then be implemented as the required program. This approach does 
not extend to DSP because there we have more than two parties. 

In this paper, we suggest an approach to DSP by directly encoding the prob- 
lem game-theoretically. We extend the notion of games to n players playing a 
game against a single hostile environment. We call this model distributed games. 
In this model, there are no explicit means of interaction among processes. Any 
such interaction must take place through the environment. Moreover, each player 
has only a local view of the global state of the system. Hence, a local strategy for 
a player is a function of its local history (of player’s own states and the partial 
view of the environment’s states). A distributed strategy is a collection of local 
strategies; one for each of the players. The environment in distributed games, 
on the other hand, has access to the global history. Any play in a distributed 
game consists of alternating sequence of moves of (some of) the players and of 
the environment. 

Distributed synthesis in this model amounts to finding a distributed winning 
strategy. This means finding a collection of local strategies that can win against 
the global environment. A side effect of the requirement that the players need 
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to win together is that they need to implicitly communicate when they make 
their moves. The card game of bridge is a good example of the kind of implicit 
communication we have in mind. When n = 1, distributed games reduce to the 
usual two-party games. 

The main technical contribution of the paper are two theorems allowing sim- 
plification of distributed games. In general it is not decidable to check whether 
there is a distributed winning strategy in a finite distributed game. The simpli- 
fication theorems allow to reduce the number of players and to reduce nondeter- 
minism in the game. In some cases, by repetitive application of these theorems 
we can simplify the game to one with only one player against the environment 
(where the existence of a winning strategy is decidable). The other possibility 
is that after simplification we get a game where environment has no choice of 
moves. We show that in this case the existence of a distributed strategy is also 
decidable. This technique is enough to solve all decidable cases of distributed 
control problems mentioned above. 

Related works. Readers may find the model of distributed games very close to 
the models of distributed systems in [25,11]. The closeness is not accidental: the 
model was partly motivated by these and the later works of Halpern et al. [12, 
10] which explore the issue of knowledge in distributed systems. 

Distributed multi-player games have been studied extensively in classical 
game theory, both in the settings of cooperation and non-cooperation among 
the players [23,21]. There have been attempts to model and logically reason 
about cooperation as in [24,26]. Distributed games can be seen as a special 
type of concurrent processes - the models of Alternating Temporal Logic - with 
incomplete information [2], and the distributed systems model of Bradfield [7] 
(which generalizes ATL and integrates incomplete information). We consider 
our proposal as being something between these models and concrete synthesis 
problems, like the one for pipeline architectures. Our model is less general but 
this permits us to obtain stronger results that allow to solve concrete synthesis 
problems. 

Amroszkiewicz and Penczek [3] study a class of games, also called distributed 
games, but the framework, questions and approaches are closer to classical game 
theory and different from ours. 

Organization of the paper. We start with a definition of games and distributed 
games. We give some simple properties of our model. In Sections 4 and 5 we 
formulate the two main theorems allowing to simplify distributed games. In 
Section 6 we show how to use these theorems to solve the synthesis problem 
for pipelines. In the full version of the paper [20], we consider other distributed 
synthesis problems and provide all the proofs. 

2 Games 

A game G is a tuple (P, E,T C V xV, Acc C R“) where (P, P, T) is a graph with 
the vertices V = P U E and Acc C is a set defining the winning condition. 
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We say that a vertex x' is a successor of a vertex x if T{x, x') holds. We call P 
the set of player vertices and E the set of environment vertices. 

A play between player and environment from some vertex v G V proceeds as 
follows: if v G P then player makes a choice of a successor, otherwise environment 
chooses a successor; from this successor the same rule applies and the play goes 
on forever unless one of the parties cannot make a move. If a player cannot make 
a move he loses; similarly for the environment. The result of an infinite play is an 
infinite path V 0 V 1 V 2 ■ ■ ■ This path is winning for player if the sequence belongs 
to Acc. Otherwise environment is the winner. 

A strategy a for player is a function assigning to every sequence of vertices v 
ending in a vertex v from P a vertex a{v) which is a successor of v. The strategy 
is memoryless iff <t(v) = a(w) whenever v and w end in the same vertex. 

A play respecting cr is a sequence VqVi . . . such that = a{vi) for all i 
with Vi G P. The strategy a is winning from a vertex v iff all the plays starting 
in v and respecting a are winning. A vertex is winning if there exists a strategy 
winning from it. The strategies for the environment are defined similarly. 

In this paper all acceptance conditions Acc C will be regular: that is, 
there will be a colouring A : — >■ Colours of the set of vertices with a finite 

set of colours and a regular language L C Colours'^ that define the accepting 
sequences by: Acc = {uqUi . . . G : A(uo)A(wi ) . . . G L} 

An important type of regular winning condition is a parity condition. It is a 
condition determined by a function Q \V ^ {0, . . . , d} in the following way: 

Acc = {uqUi . . . G : liminf Q{vi) is even} 

i—¥oo 

Hence, in this case, the colours are natural numbers and we require that the 
smallest among those appearing infinitely often is even. The main results about 
games that we need are summarized in the following theorem 

Theorem 1 ([19,9,22]). Every game with regular winning conditions is deter- 
mined, i.e., every vertex is winning for the player or for the environment. In a 
parity game a player has a memoryless winning strategy from each of his winning 
vertices. It is decidable to check if a given vertex of a finite game with a regular 
winning condition is winning for the player. 

3 Distributed Games 

A local game is any game G = (P, E, T) as above but without a winning condition 
and with the restriction that it is bipartite, i.e., a successor of a player move is 
always an environment move and vice versa. 

Let Gi = {Pi, Ei,Ti), for i = l,...,n, be local games. A distributed game 
constructed from Gi, . . . ,G„ is G = {P,E, T, Acc C (A U P)“) where: 

1. E = El X ■ ■ ■ X En. 

2. P = (Pi U Ai) X . . . (P„ U P„) \ E. 

3. From a player’s position, we have (a;i, . . . ,Xn) {x'l, . . . G T if and 
only if Xi ^ x'i G Ti for all Xi G Pt and Xi = x[ for all Xi G Ei. 
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4. From environment’s position, if we have (xi, . . . ,x„) — >■ {x{, . . . , x'„) G T 
then for every Xi, either Xi = x' or a;' G Pi and moreover (xi,...,cc„) ^ 
{x[,...,x'J 

5. Acc is any winning condition. 

Observe that a distributed game is a bipartite game. Notice that there is an 
asymmetry in the definition of environment’s and player’s moves. In a move from 
player’s to environment’s position, all components which are players’ positions 
must change, and the change respects transitions in local games. In the move 
from environment’s to player’s position, all components are environment’s posi- 
tions but only some of them need to change; moreover these changes need not 
to respect local transitions. Hence, while global moves of the player are a kind 
of free product of moves in local games, it is not the case for the environment. 
The moves from environment positions are the only part of a distributed game 
that is not determined by the choice of components, i.e., of local games. This 
freedom makes it possible to encode different communication patterns and other 
phenomena. 

We interpret a distributed game as a game of n players against environment. 
This intuition will become clear when we will define the notions of views and 
local strategies. 

For an n-tuple rj and i = 1, . . . ,n, let r][i] denote the t-th component of rj. 
Similarly, for a sequence v = r]ir ]2 ■ ■ ■ of n-tuples, let v[i] = ? 7 i[z] 772 ( 1 ] . . . denote 
the projection of the sequence on the f-th component. 

From the definition of the moves it is easy to observe that given a play v in 
a distributed game G, the projection of v to the positions of the t-local game, 
7;[7], is of the form e^poCiPi . . . Note that the player’s positions do not repeat 
since as soon as the local game moves to a player position, it reacts immediately 
with an environment position. 

Definition 1. Consider a play v and let e^poCiPi ... be the projection of v on 
i-th component. The view of process i of v is viewi{v) = eopoCipi . . . 



Definition 2. An i-local strategy is a strategy in the game Gi. A distributed 
(player) strategy a is a tuple of local strategies {u \, . . . , cr„). 

A distributed strategy a defines a strategy in G by <j{v ■ {xi, . . . ,Xn)) = 
(ei, . . . , e„) where Ci = Xi if Xi G Ei and = Ui{viewi{v ■ Xi)) otherwise. We 
call cr the global strategy associated with the given distributed player strategy. 

Remark 1. It is important to note that thanks to the definition of distributed 
game any tuple of local strategies indeed defines a (global) strategy, i.e., it always 
suggests a valid move. 

Examples and easy observations. Consider the two local games Gi and G2, pre- 
sented in Figure 1 where players’ positions are marked by squares and environ- 
ment’s positions by circles. Observe that in the second game the moves of the 
player are more restricted than in the first game. 
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Local games 




Fig. 2. Global games 



Consider a (part of) distributed game built from Gi and G2 presented on the 
left part of Figure 2 . Observe that in this game environment has less possibilities 
than it would have in the free product of Gi and G2. For example, from the 
position e( in Gi environment can move to similarly from 62 in G2 it can 
move to P2; but there is no move from (e{, e'2) to (_Pi,_P2) the distributed game. 

Suppose that the winning condition in this game is to avoid environment’s 
positions where the components have different polarities, i.e., vertices (61,62) 
and (61,62). It is clear that there is a global winning strategy in this game. 
In position (pi,p2) players should go to (61,62) and in position (pi,p^ they 
should go to (61,62). We claim that there is no distributed strategy. Suppose 
conversely that we have a distributed strategy (cti,CT 2) which is winning from 
the vertex (61,62). If environment moves to the position (pi,P2) then player 1 
should respond with e(. Hence cri(eipi) = 6^. But now, if environment moves 
to (pi,p2) then the view of player 1 of the play is the same, so he moves also to 
o’li^iPi) = 61, which is a losing move. 

As another example consider the game on the right of Figure 2 . This is almost 
the same game as before but for some more suppressed moves of the environment. 
Once again there is a global winning strategy in this game. But this time there 
is also a distributed strategy. Define ai{vpi) to be e[ if the number of pi in v 
is even and to be ci otherwise. Let <J2{vp2) = 62 and (J2{vp'2) = 63. It is easy to 
verify that (cti,<T 2) is a distributed strategy winning from (61,62). Observe that 
strategy a\ is not memoryless. It is easy to check that there is no distributed 
strategy which is a memory less strategy for player 1. 

The following fact summarizes the observations we have made above. 
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Proposition 1. There exist distributed games with a global winning strategy for 
the players but no distributed winning strategy. There exist distributed games with 
a memoryless global strategy but where all distributed strategies require memory. 

It is not difficult to show that it is not decidable to check if there is a dis- 
tributed winning strategy in a given distributed game. The argument follows the 
same lines as, for example, in [29,16,14,4]. 

Proposition 2. The following problem is undecidable: Given a finite distributed 
game check if there is a distributed winning strategy from a given position. 

Recall that by Theorem 1 it is decidable if there is a global winning strategy 
in a finite game. There are two cases when existence of a global winning strategy 
guarantees existence of a distributed winning strategy. The first is when we just 
have one player. The second is when a game is environment deterministic, i.e., if 
each environment position has exactly one successor (like in the second example 
above). 

Proposition 3. If there is a global winning strategy in an environment deter- 
ministic game then there is a distributed winning strategy from a given position. 

4 Division Operation 

Let us assume that we have a distributed game G = {P, E, T, Acc) with n -I- 1 
players constructed from local games Gi = {Pi,Ei,Tf). We would like to con- 
struct an equivalent game with n players. This will be possible if some of the 
players can deduce the global state of the game. 

Definition 3. A game Q is i-deterministic if for every environment position 
rj of G and every ( 77 , tti), ( 77 , 7 T 2 ) € T^, if tti yf tt 2 then tti [ z] , 7 T 2 [t] G Pi and 
^ 7!‘2[*]- 

Intuitively, the definition implies that player i can deduce the global position of 
the game from its local view. 

We use two functions for rearranging tuples: fiat{{xo,Xn),xi, . . . ,Xn-i) = 
(xo,xi,X 2 , ...,Xn) and flat~^{xo,xi,X 2 , . • . ,x„) = ((xq, x„), xi, . . .,x„-i). We 
extend these functions point- wise to sequences. 

Division operation. For the game G, we define divide(C/) = {P,E,T,Acc). It 
consists of the local games Gi = {Pi, Ei,Ti) {i = 0, . . . ,n — 1) where: 

~ Pi = Pi, = El and Tj = T* for i = 1, . . . , n - 1; _ 

- Eo = Eo X En and Pq = {Pq U Eq) x (P„ U P„) \ Eq; 

— In To we have transitions from {po,en), (eo,Pn), {po,Pn) to (eo,e„) provided 
(po,eo) G To and (p„,e„) G T„. 

The global moves from the environment positions in divide(C/) and the ac- 
ceptance condition are defined by: 
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— rj ^ n GT if flat{rj) — >■ flat{Tr) G T, 

— Acc = {t) : flat{v) G Acc}. 

Theorem 2. Let Q he a 0-deterministic and n- deterministic distributed game 
o/n+1 players. For every position p ofQ, there is a distributed winning strategy 
from rj iff there is one from flat~^{ri) in Q . 

5 Gluing Operation 

Let us assume that we have a game Q = (P, E, T, Acc) constructed from n + 1 
local games Go, . . . , G„. We are going to define an operation glue which is like 
determinizing the behaviour of the environment for one of the players. This is 
sometimes a necessary step before being able to apply the division operation. As 





Fig. 3. Gluing operation 



an example suppose that in a distributed game we have the moves from (eg, ei) 
as on the left side of Figure 3. There are two transitions with the same player 1 
positions, so the game is not 1-deterministic and we cannot apply the division 
operation. Gluing together the possibilities for player 0 (as depicted on the right) 
we make this part of the game deterministic for both players. 

For gluing operation to work, the game should satisfy certain conditions. 
There are in fact two sets of conditions describing different reasons for the op- 
eration to work. The first is when, the player being glued has almost complete 
information about the whole game. In the second, he almost does not influence 
the behaviour of other components. 

Definition 4. A game Q is I-gluable if it satisfies the following conditions. 

1. G is 0-deterministic; 

2. G has no 0-delays: if (cq, Ci, . . . , e„) — >■ (xq, a:i, . . . , Xn) then xq G Pq; 

3. The winning condition Acc is a parity condition on player 0.' there is a map 
f2 : (Pq U Pq) N such that v G Acc zjf lim infj_>oo i2(viewo(v)) is even. 

Definition 5. A game G is Il-gluable if it satisfies the following conditions: 

1. The moves of other players are not influenced by player 0 moves, i.e., if 
(eo, ei, . . . , e„) — >■ (xq, Xi, . . . , x„) then for every other environment position 
e'g we have (cg, ei, . . . , e„) -G (xg, xi, . . . , x„) for some Xg. 
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2. The moves of player 0 are almost context independent: there is an equivalence 
relation [EqxPqY s.t. z/(eo, ei, . . . , e„) — >■ {po, Xi, . . . , Xn) then for every 
(e'o.Po)' (eo,ei,...,e„) (pg, Xi, . . . , x„) z# (e'o.p'o) ~ (eo,Po)- 

3. G has no 0-delays. 

4- The winning condition is a conjunction of the winning condition for players 
1 to n and the condition for player 0. Additionally, the condition for player 
0 is the parity condition. 

Glue operation. We define the game G = GLUe(C/) of n + 1 players as follows 
(to make the notation lighter we will use abbreviated notation for tuples, for 
example we will write (eo, e) instead of (eg, ei, . . . , e„)): 

— Pi = Pi, Ei = Ei and Ti = Ti for all i = 1, . . . , n; 

— Po = P{Eq X Po) and Eq = V{Pq x Eq); 

— p — >-0 e if for every (e,p) G p there is (p, e') G e fl Tq; 

~ (eo,e) (xo,x) G T for -eo ^ 0, where xo = {{eo,xo) : 3(p',eo) G 

^(eo,e) (xo,x)}. 

— Ace will be defined shortly. 

Consider u = u\,. . .U 2 k G (Pq • -Po)'*'- K is a sequence of sets of pairs of 
nodes of the game Go- A thread in u is any sequence eipi • • • CkPk G (PoPo)^ 
such that {pi-i,ei) G U 2 i-i and (ei,Pi) G U 2 i for all f = l,...,k. Similarly we 
define threads for infinite sequences. Let threads (u) be the set of threads in u. 
We put: 

u G Ace iff every v G threads (view o{u)) satisfies the parity condition 17 
and u satisfies the conditions for players 1, . . . , n 

Observe that if a game is I-gluable then the winnmg condition is only on 0-th 
player and the second clause in the definition of Ace is automatically true. 

Theorem 3. Let G he a I-gluable or Il-gluable game. There is a distributed 
winning strategy from a position rj in G iff there is a distributed winning strategy 
from the position rj in GLUe(C/). 

The proof of this theorem is relatively long and it is not presented here. 
While in principle it uses similar methods as in determinization of automata on 
infinite words, some arguments need to be refined [20]. 



6 Synthesis for Pipeline Architectnre 

A pipeline is a sequence of processes communicating by unidirectional channels: 
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We assume that the alphabets Aq, . . . , An are disjoint. The execution follows 
in rounds. Within a round, processes get inputs and produce outputs in a step- 
wise fashion. At the beginning of a round, process Cn gets input a„ € An from 
the environment and gives an output a„_i G A„_i. In the next step, this output 
is given as input to process C„_i and so on. When Ci has given an output, the 
round finishes and another round starts. 

A local controller for the z-th component is a function fi : (A^)* — >■ Aj_i. A 
sequence aoboaibi • • • G {Ai ■ Ai_i)“ respects fi if fi(aoai . . . aj) = bj for all j. 

A pipeline controller P is a tuple of local controllers (/i,...,/„), one for 
each component. An execution of the pipeline is a string in (A„A„_i ■ ■ ■ Aq)‘^ . 
An execution v respects P if v\{Ai U Ai_i) respects fi, for all z = 1, . . . , n. 

Let S = Ui=o n^i- ^ controller P defines a set of A-labeled paths £-{P) 
which is the set of all the executions respecting P. 

The pipeline synthesis problem is: given a pipeline over alphabets Aq, . . . , A„ 
and a deterministic parity word automaton A over the alphabet A = Ui=o n 
find a pipeline controller P = (/i, . . . , fn) such that C{P) C L{A). 

We would like to remark that in the proof presented here there is no difficulty 
to consider branching specifications, i.e., tree automata [13]. We restrict to word 
automata because we have no space to give the definition of automata on trees 
with nodes of varying degrees. 

Encoding into a game. A pipeline synthesis problem is coded as a distributed 
game Q = {P,E,T,Acc) constructed from local games Go, . . . ,Gn, with Go 
taking the role of the automaton A = {Q, S,q^ ,6 : Q x S ^ Q, f2 : Q ^ N) 
and Gi the role of the z-th component Gi. The game Go is: (1) Pq = Q x 
Eo = Q', (2)(<?,zc) — >■ g' G To if q' = 6{q,w); and q — >■ (q,w) G Tq for all 
w G A"+b 

For each component z = 1, . . . , rz we have the game Gi which is defined by: 
Pi = Ai] Ei = {Ai — >■ Ai_i); and there is a complete set of transitions between 
Pi and Ei. 

From an environment position {q, fi, . . . , fn), for a letter a„ G A„ we have a 
move to {{q, zc(a„)), oi, . . . , a„) where w(an) = a„a„_i ... oo is a word such that 
ai-i = fi{ai)- 

The winning condition Acc is the set of sequences such that the projection 
on the states in the first component satisfies the the parity condition of the 
automaton A. Here we need to assume some special form of A. This is because 
we “jump” over the states by using S{q, w) for w a word of zz-|- 1 letters. We need 
to be sure that while doing this we do not jump over states of small priority. 
As the length of w is fixed we can easily guarantee this. The initial position is 
Vo = di, ■ ■ ■ , o-n) for some arbitrarily chosen letters oi, . . . , a„. 

Lemma 1. There is a distributed winning strategy in Q from rjo iff the pipeline 
synthesis problem has a solution. A distributed winning strategy gives a controller 
for the pipeline. 
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Decidability. We will abstract some properties of a pipeline game and show that 
for any game with these properties it is decidable to establish if there exists a 
distributed winning strategy. 

Definition 6. A game Q is i-sequential if for all environment positions rji and 
if Vi ^ 1 : ^2 -t 7T2, rii[l,i] = 772[1,*] and TTi[l,i - 1] yf 7T2[1 ,i - 1] then 
7Ti [i] yf 7T2 [i] and 7Ti [z] , 7T2 [z] € Pi- Here we zzse ? 7 [l,z—l] to denote the subsequence 
of the sequence rj consisting of elements on positions from 1 to i — 1; note that 
tuple rj has also 0 position. 

Definition 7. We call a game (0, zz)-proper if it satisfies the following: 

PI G is Q- deterministic, has no 0-delays, and its winning condition is a parity 
condition on player 0; 

P2 G is n-deterministic; 

PS G is i-sequential for all i & {1, , n}. 

A game is (0,n) -almost proper if it is proper except that it may not satisfy 
P2. Observe that the condition P3 does not imply P2, as sequentiality does not 
say anything about player 0. 

The following results are direct consequences of the definitions. 

Lemma 2. The following are true about (almost) proper games: 

— The pipeline game G is (0,n) -proper. 

— A {0,n)-proper game G is dividable and divide(C/) is a {0,n — \)-almost 
proper game. 

— A {0,n)-almost proper game G is Tgluable and GLUe(^) is a {0,n)-proper 
game. 

Corollary 1. Existence of a distributed strategy in a (0,n) -proper game is de- 
cidable. The synthesis problem for pipelines is decidable. 

Even though our definition of pipeline architecture is slightly different from 
those in Pnueli and Rosner [29] or Kupferman and Vardi [13], one can see that 
they can be captured by our definition. One can also see that two-way pipelines 
and one-way rings of [13] give rise to (0,n)-proper games. Hence, we get decid- 
ability results for all these classes of architectures at one go. The translation of 
two way rings does not give (0, zz)-proper games. Indeed the synthesis problem 
for this architecture is undecidable. 

7 Conclusions 

We have introduced a notion of distributed games as a framework for solving 
distributed synthesis problems (DSP). We have tried to make the model as 
specific as possible but still capable to easily encode the particular instances 
of DSP found in the literature. This deliberate restriction of the model is the 
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main difference between our approach and the general models proposed in the 
literature [23,2,7]. 

Having decided on a model we have looked for a set of tools that can be 
applied to different instances of DSP. We have given two theorems allowing to 
simplify distributed games. We have shown that they can be used to solve the 
pipeline synthesis problem. In the full version of the paper [20] we consider three 
more problems: local specifications and double-flanked pipelines [17,16], synthe- 
sis for communicating state machines of Madhusudan and Thiagarajan [18], and 
decentralized control synthesis of Rudie and Wonham [32]. We give these exam- 
ples in order to show that the framework of distributed games is rich enough 
to model different synthesis problems proposed in the literature. This way we 
also show that the two simplification theorems we propose are powerful enough 
to solve known decidable cases of the distributed synthesis problem. The ad- 
vantage of our approach is that we separate the proofs into two steps: coding 
an instance in question as a distributed game model and use of simplification 
theorems. While both steps are not completely straightforward, they neverthe- 
less allow some modularization of the proof and reuse of the general results on 
distributed games. 

We hope that distributed games will be useful in exploring the borderline 
between decidability and undecidability of DSP. For example, the only archi- 
tectures for which the DSP problem with local specifications is decidable are 
doubly flanked pipelines. The undecidability arguments for other architectures 
use quite unnatural specifications that require a process to guess what will be 
its next input. We hope to find interesting classes of pairs (architecture, specifi- 
cation) for which the DSP problem is decidable. We want to do this by encoding 
the problem into games and looking at the structure of resulting games. In a 
similar way we want to find decidable instances of DSP in the discrete control 
synthesis framework. 



Acknowledgments. The authors are very grateful to Julien Bernet and David 
Janin for numerous discussions and valuable contributions. 
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Abstract. We present a space- and time- efficient algorithm for main- 
taining multidimensional histograms for data that is dynamic, i.e., sub- 
ject to updates that may be increments or decrements. Both space used 
as well as per-update and computing times are polylogarithmic in the in- 
put data size; this is the first known algorithm in the data stream model 
for this problem with this property. 

One of the powerful motivation for studying data stream algorithms is 
in analyzing traffic log from IP networks where d-dimensional data (for 
small d) is common. Hence, our results are of great interest. The re- 
sult itself is achieved by generalizing methods known for maintenance 
of unidimensional histograms under updates — finding significant tensor 
generalizations of one dimensional wavelets, approximating distributions 
by robust representations — and relationships amongst histograms such 
as those between tensor wavelets or hierarchical histograms and general 
histograms. 



1 Introduction 

Let Aj a,n N X N dynamic array^ at time j. Input is a series of updates. The 
jth input is (i, k,Cj), which updates Aj_i to be Aj, where 

Aj[i, k] = Aj_i[i, k] + Cj 
Aj[i',k'] = Aj-i[i' , k'] if{i',k') ^ {i,k). 

A histogram H of B buckets is a partition of the indices of Aj (written A when 
j is understood) into a set R of B axis-parallel rectangular regions together 
with a set {hr\r G R} of bucket heights, one for each region. A query [i,k] in 
region r might be answered approximately as H[t, fc] = hr instead of exactly as 
A[f, k]. We assess the goodness of H for A in terms of sum-square-error (SSE), 
or equivalently, the square of the L2 norm: || A — H|p = j,(A[f, k] — H[f, fc])^. 

The multidimensional histogram maintenance problem is to maintain a his- 
togram Hj for Aj over updates j and support hist queries. The hist query at time 

* Supported by NSF EIA 0087022, NSF ITR 0220280 and NSF EIA 02-05116. 

^ We have defined the problem for square arrays in two dimensions for simplicity. Our 
results extend naturally to non-square rectagular arrays and, for the most part, to 
higher dimensions. 
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j is required to return a B-bucket histogram for Aj of smallest SSE. We are pri- 
marily concerned with approximate versions of this problem, in which the num- 
ber of buckets in the returned histogram may be greater than B and the SSE may 
be greater than the SSE ||A — of the best i?-bucket histogram For- 
mally, the (a, /3)-approximate multidimensional histogram maintenance problem 
(or (a,/3)-AMHM problem) takes input a sequence of updates and hist queries, 
and, for each hist query, outputs a histogram with at most aB buckets and SSE 
at most /3||A-HfpJl2. 

In the traditional setting, i.e., the offline version of the problem, all updates 
are increments to distinct indices and precede a single hist query. If A were of 
one dimension, the exact histogram problem can be solved in time and space 
polynomial in N by dynamic programming [8]. The two (and higher) dimen- 
sional problems are known to be NP-Hard, but, in two dimensions, a (0(1), 1)- 
approximation can be computed in polynomial time and space [10]. For example, 
the authors in [10] give an algorithm to produce the best hierarchical histogram. 
A hierarchical histogram is one with a hierarchical bucketing, i.e., in which the 
bucketing is obtained by repeatedly partitioning one of the existing buckets into 
two pieces. See [10] for pictures and details. Since any bucketing of B buckets 
in two dimensions can be refined to a hierarchical bucketing of 4B buckets [2], 
a (4, l)-approximation to the general problem results [10]. The authors in [10] 
also give a (2, l)-approximation, using different techniques. 

Our focus is the emerging context of streaming where updates and hist queries 
are interleaved online, and we are only allowed polylog space overall and only poly- 
log time per update and/or hist query. Typically B <C A, so factors polynomial 
in B is fine; but functions of N and should be poly logarithmic. This is 

the emerging model for massive data analysis where the data is truly massive so 
it is desirable to compute while making only one pass over the data, or in key 
applications like analyzing extremely fast data sources such as IP network traffic 
logs, web click streams, spatial measurements from satellites, etc. and monitor- 
ing database contents with rapid transactions. Streaming models and algorithms 
have drawn tremendous focus recently with surveys [3,9], tutorials [4,13], work- 
shop [11], and more. 

We study the AMHM problem in the streaming model. Our motivation is 
clear: most data streams are inherently multidimensional, for small d. For ex- 
ample, in the IP network case, traffic of IP “flows” generates log data stream 
with fields source IP address, destination IP address, source/destination ports, 
number of bytes or packets, etc. Summarizing such data streams calls for mul- 
tidimensional histograms because they are piecewise constant approximations 
of the underlying data. A recent result [5] presented (1,1-1- e) -approximations 
for the one-dimensional version of this problem; provably, one can not obtain 
the optimal histogram in the data stream model. However, no result of similar 
nature was known for the two-dimensional histogram problem. For example, [12] 
gives a variety of results that take just polylog space but require i7{N^) time 
for hist queries. This is an important hole in our knowledge of data stream 
summarization, and often cited amongst researchers in this emerging area. 




354 



S. Muthukrishnan and M. Strauss 



In this paper, we present the first algorithm for AMHM to use poly log space 
and polylog time for updates and hist queries. More formally, we show: 

Theorem 1. (Streaming) Consider NxN integer-valued signal A. We can solve 
the B-hucket, (4,1 + e)-AMHM problem using (i? log || A|| log(iV)/e)‘^(^) space, 
per-item time, and time per hist query. 

If we were to use the streaming algorithm above for offline case, total time 
would be TN"^, where T < {Blog ||A|| log(A)/e)*^*^^^ is the per-item time of the 
dynamic algorithm — unacceptably higher than linear time ciA^. Instead, we use 
the streaming algorithm differently to obtain the first algorithm to use polylog 
space and linear total time for the offline problem. More formally, we show: 

Theorem 2. (Offline) Consider N xN integer-valued signal A. For static data, 
for constants C\ and Ci, we can (4, l-\-e)- approximate the best B-bucket histogram 
in time c\N'^ -\-{B\og ||A|| log(A)/e)'^^ and space {Blog ||A|| log(A)/e)‘^^, making 
one pass over the data, with the order we specify to lay out A. 

Both the results above are achieved by using many of the techniques that 
have been developed in the context one dimensional histograms [5,6], combining 
them with techniques known for offline multidimensional histogramming [10], 
and extending them. The extensions we do are in technical aspects such as in 
(a) proving that a hierarchical histogram approximation to robust multidimen- 
sional representation is in fact a good approximation to the best histogram (we 
provide a simple proof extending the ideas in one dimension to two dimensional 
hierarchical histograms), (b) generalizing the wavelet based approach in [5] to 
tensor products, (c) finding the significant tensor wavelet terms efficiently (we 
work directly on the tensor wavelet terms as updates come in, so this task is 
simpler than in previous results), etc. The result involves several technical de- 
tails, some of the which discussed in this writeup; the rest will be in the final 
version. 

2 An Ideal Algorithm 

Our algorithm relies on wavelet tensor products. We first define Haar wavelets 
on a linear array of length N, then define tensor products of these, which are 
wavelets on the NxN square grid. 

Definition 1. Let N be a power of 2. We define N —1 proper wavelet functions 
on [0, A) as follows. For integer j, 0 < j < log{N) and integer k, 0 < k < 2C 
we define a proper wavelet by 

r -y^W/N, X G [kN/2CkN/2^ + N/2^+^) 
f{x) = < Fy^WjN, X G [kN/2^ + N/2^+\ {k + l)N/2^) 

{ 0, otherwise. 

Additionally, we define a wavelet function 4>, also known as a scaling function, 
that takes the value -I-I/VN over the entire [0, A) linear grid. 
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Support of a vector v, denoted supp(w), is {t : v{t) ^ 0}. Thus the support 
of a wavelet vector is either the entire interval [0, TV) or, recursively, the left half 
or right half of the support of some other wavelet (its parent in a naturally- 
defined binary tree). Each wavelet is constant on the left half and right half of 
its support. The wavelet </> is constant on its support; each other wavelet takes 
values on its left and right halves that are negatives of each other. The set of 
possible supports of a wavelet is the set of dyadic intervals of length at least 2. 
(A dyadic interval is the whole space a power of two in length, or recursively, a 
dyadic interval in the left or right half.) 

The following well-studied properties of wavelets are easy to check. 

~ There are N wavelet functions on [0, N) forming an orthonormal basis, that 
is, each function has norm 1 and any pair of distinct functions is orthogonal. 
This is because the set of supports of wavelets is hierarchical and a wavelet 
more coarse than a given proper wavelet ■0 will be constant on supp(0) while 
the average of 0 on its support is zero. 

— Each point x in [0,iV) is in the support of 0(log(A^)) wavelets, one proper 
wavelet at each resolution level. 

— Each characteristic function \i on ^tn interval / can be written as the sum 
of 0(log(A^)) wavelets — the wavelets whose support includes an endpoint of 
I . Other wavelets will have zero dot product with xi- 

— Each wavelet function is piecewise-constant of at most 0(1) pieces. 

Now we turn to tensor products of Haar wavelets (TPWs), which we will use 
on the N X N grid. 

Definition 2. Let ipi and ■02 be two wavelets on [0, N) and let (x, y) be an index 
into the N x N grid. Define (V’l ® ’4’2){x,y) to be 'if’i{x)tp 2 {y) ■ 

The following well-known properties of tensor products and Haar wavelets 
are easy to check. 

— There are TPWs, since there are N choices for each of ■0i and 02- 

— Each TPW has norm 1, since 

1101 ® 02II2 = ^2(2/)^- 

X y X y 

— Similarly, each pair of distinct TPWs is orthogonal. Thus the TPWs form 
an orthonormal basis. 

— Each element {x, y) in the NxN grid is in the support of 0(log^(iV)) TPW’s, 
since {x,y) G supp(0i ® 02) iff cc G supp(0i) and y £ supp(02). 

— Each characteristic function xixJ on a rectangle I x J is the sum of 
0(log^(A^)) TPW’s — the tensor product of one of 0(log(N)) wavelets whose 
support contains an endpoint of I with one of 0{log{N)) wavelets whose 
support contains an endpoint of J. 

— The support of each TPW is rectanglewise-constant on at most 4 rectangles. 
Specifically, the support of each TPW 0i ® 02 is a dyadic rectangle I x J 
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(the product of two dyadic intervals). If neither / nor J is the whole space, 
then and r /)2 are proper wavelets (not the average of everyone, 0), and 
V’l ® ■02 is constant on each of four similar dyadic rectangles in a 2-by-2 
quartering of / x J. If = 0, then ipi 0 02 is constant on each of 2 halves 
of [0, VN) X J (splitting J in half), and, similarly, if 02 = 0, then 0i 0 02 
is constant on each of 2 halves of / x [0, '/N)- Finally, 0 0 0 is constant on 
all [0,A^) X [0,7V). 

Readers can observe that TPWs differ from another generalization of one 
dimensional Haar wavelets to two (or multi) dimensions where one only considers 
“square” supports and not products of dyadic intervals as we do here. 

Let (•,•) denote the dot product. The Parseval equality states that ||A ||2 = 

(A,0)^ 

The best k-term representation for signal A by TPWs is, by the Parseval 
equality, R = Eieyi where Cj = (A, 00 and yl of size k maximizes EieA ■ 
This is because the signal is exactly recoverable as A = (A,000i, so the 

error of R is 

||A - Rf = ^(c, - (A, 00)2 + ^ (A, 002 . 

iGA i^A 

Alternatively, one could describe finding the best fc-term representation 
greedily, by finding the best 1-term representation Ci0j, subtracting it off, and 
finding the best (fc — l)-term representation to A — Ciipi. The greedy formula- 
tion will be more useful to us, as we generalize it to finding a near-best 1-term 
representation later. 

There are clear relationships between TPWs and histograms as representa- 
tions of signals. For example, 

— The best 0{Blog^{N))-term TPW representation to A is a {0(\o^{N)), 1)- 
approximation to the best B-term histogram representation to A. 

The best R-bucket histogram is a 0(77 log2(7V))-term TPW representation, 
so the error of the TPW representation is as claimed. Since a 0{Blog^{N))- 
term TPW is also a 0(7?log2(7V))-bucket histogram, the result follows. 

— The best B-bucket histogram to the best k-term TPW representation, for 
some k at most 0{Blog^{N)), is a (1,9) -approximation to the best B-bucket 
histogram to A. 

Let R be the best fc-term representation, where k, as above, is large enough 
that any 77-bucket histogram is a fc-term TPW representation. Let H be the 
best 77-bucket representation to R, and let Hopt denote an optimal 77-bucket 
histogram. We have, using the triangle inequality, 

IIA-Hjl < ||A-R|| + ||R-H|| 

< ||A-R|| + ||R-Hopt|| 

< ||A-R|| + ||R-A|| + ||A-Hopt|| 
<3||A-Hopt||. 



which gives the claim since error is square of the L 2 norm. 
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Our main result is obtained by the following theorem which relates TPW 
representation to histogram representation in a more subtle manner. 

Theorem 3. For some T < 0(l/(e^ log(l/e))) and any signal A, let R be the 
representation formed by greedily taking Blog^{N) TPW terms at a time until 
either 

— we get TBlog^ (N) terms, or 

— taking an additional Blog^{N) TPW terms improves (reduces) the residual 
||A — R||| by a factor no better than (1 — e'), where e' > I7(e^). 

Let H be the best B-bucket representation to R. Then H is a (1,1 + e)- 
approximation to the best B-bucket histogram to A. 

Proof. Suppose R has TBlog^{N) terms. Then 
||A - R||i < (1 - e')(^-^^||A - 

where is the best representation of Blog'^{N) terms. By choice of 

T = 1 + (1/e') log(25/e^), we have (1 — « e^/25 and, by previous obser- 

vations, II A — R(-®i°s^(''^)) II < ||A — Hoptll, and 

||A-H||2 < ||A-Hopt||2 + 2||A-R||2 
< (l + 2e/5)||A- Hoptll, 

so that ||A — HII 2 < (1 + e)||A — HoptHi for sufficiently small e. 

Now suppose R has fewer than T terms, so that R together with any ad- 
ditional Blog^{N) terms has error at least (1 — e')||A — R|| 2 - It follows that 
the Blog^{N) terms in H or Hopt do not improve the square error of R by 
more than the factor (1 — e'). Let H, regarded as a histogram or TPW repre- 
sentation, be the best linear combination of R and H; our hypothesis is that 
||A-H||2 > (l-e')||A-R||2 = ||A-R||2-e'||A-R||2. Note that, since H is 
the best representation on a specified line (containing R and H), it follows that 
there’s a right angle at H, so that, as in Figure 1, we have 

||R-Hf = ||R-Af-||A-Hf 
<e'||A-Rf 
<(eV9)||A-Rf. 

We have, using Cauchy-Schwarz in the penultimate line, 

||A-Hf = ||A~Hf + ||H-Hf 

= ||A-Hf +(||R-H||±||R-H||)' 

= II A - Hf + ||R - Hf ± 2||R - H||||R - H|| + ||R - Hf 
= ||A-Rf -H ||R-Hf ±2||R-H||||R-H|| 

C ||A-Rf -h ||R-Hf ±2(e/3)||R-H||||A-R|| 

C ||A - Rf + ||R - Hf ± (e/3) (||R - Hf + ||A - Rf ) 

= (l±e/3)(||A-Rf + ||R-Hf). 
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Above we used only the robustness of R and the fact that H is a S-bucket 
histogram. That is, for any B-bucket histogram H, robustness of R insures that 
there’s a near-right-angle at R, so that 

||A - Hf = (1 ± e/3) (||A - Rf + ||R - Hf ) . 

Thus, similarly, 

II A - Hoptir = (1 ± e/3)(||A - Rf + ||R - Hoptf )• 

Note that H is an optimal B-bucket histogram for R and Hopt is another B- 
bucket histogram; it follows that ||R — H|| < ||R — Hopt||- Thus we have 

II A - Hf < (1 + e/3) (II A - Rf + ||R - H^ptf ) 

and 

II A - Hoptf > (1 - e/3) (||A - Rf + ||R - Hoptf) ■ 

Thus we have 

l|A - Hf < ^^l|A - Hoptf < (1 + e)||A - Hoptf, 
for sufficiently small e, as desired. 



A 




Fig. 1. Illustration of histograms in Theorem 3. The representations H, R, and H are 
colinear, but R may or may not be between H and H. By optimality of H, there is a 
right angle as indicated. 



The representation R in the theorem above is the two dimensional analog 
of what is known as the robust representation in [5]. The entire theorem is the 
generalization of the proof that the robust representation is good for approxi- 
mating the best histogram in one dimensions found in [5,6], but the proof here is 
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extended to higher dimensions and simplified. For higher dimensions, the poly- 
nomial in polylog factors has exponent d, the number of dimensions. 

The idealized algorithm is now clear at high level. We follow the greedy 
algorithm in the theorem above and our task then becomes computing H, the 
best i?-bucket representation to R. From the theorem above, R has at most 
S = TBlog^{N) terms. Let Hrob be the histogram determined by the R since 
TPWs can be thought of as histograms. As discussed earlier, finding the best B- 
bucket histogram is NP-Hard in general [10]. So, instead, we find and output the 
best 4B-bucket hierarchical histogram H for Hrob which suffices as we discussed 
earlier. This is done quickly by observing that the boundaries o/H may he chosen 
from the 0{S^) boundaries o/Hrob; and using dynamic programming, as follows. 
The best hierarchical histogram approximation to H^ob on some rectangle R 
is got by finding the best top level partition of R into two pieces and then 
finding the best hierarchical histogram on each piece. Such an H is a (1, 1 -I- e)- 
approximate hierarchical histogram for A, so H is a (4, 1 -I- e) approximation to 
the best R-bucket general histogram for A. 

That completes the description of the overall algorithm. At the high level, it 
has two steps: (1) Finding the representation R which is robust as defined by 
Theorem 3, and (2) finding the best 4B bucket hierarchical histogram for R using 
dynamic programming. The second step takes time and space only polynomial 
in S, hence it is polylogarithmic in input size N. 

This algorithm however is idealized because in the streaming model, we have 
not shown how to implement Step 1. In fact, as stated, it involves finding some 
of the largest TPW terms which provably needs 17(A) space under updates. In 
the next section we will fix this difficulty. 



3 Details 

3.1 Streaming Data 

First consider the streaming setting. We need to find the robust representation 
R. We will now follow the framework in [5] where the algorithm uses sketches 
to trace the effect of updates on A and uses group testing techniques to isolate 
nearly the best top wavelet coefficients. We need to generalize it to TPWs now. 
We present an overview here for some insight. 

Formally write A = 'Yfj tfj, where {tpj} is an orthonormal basis of 

TPW functions. When an update for A arrives, decompose it into 0(log^(A)) 
updates to TPWs. Partition the set of N'^ TPW basis functions into 0{B‘^ je^) 
buckets, pairwise at random; each TPW coefficient that is significant in A, i.e., 
such that > 6?(e^/R)|| Ajp, becomes overwhelming in some bucket, i.e., 

> (2/3)||A(I^|P, where A^I) is A projected onto the TPWs in the j’th 
bucket. Order the TPWs arbitrarily. For each j, further partition the TPWs in 
the j’th bucket into two groups, log(A^) different ways: the first half versus 
the second half of the TPWs {i.e., those whose label has most significant bit 
0 versus 1), the first and third quarters versus the second and fourth quarters 
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{i.e., based on the second most significant bit), etc. Let denote one 

of the two groups in the split on fc’th most significant bit, k < log(fV^), for 
the j’th bucket, j < Let random vector X, indexed by tps, take 

values ±1, 4-wise independently at random. We track (^A^kk,±) ^ x'^ in the TPW 
domain, which can be done in small space, by generating components of A as 
needed from a small random seed. (That is, relying on the orthonormality of the 
TPW transform, = (A, A) = ('*/') A). We’ve converted 

input, updates to A, into updates to (A,r/;)’s. We use a generator to produce 
(r/>,A) directly; we do not ever instantiate X{t).) Finally, we repeat this for 
0(log(l/<5)/e'^) independent copies of X. This is how we process updates; we 
now turn to processing hist queries. 

To find significant TPW coefficients, we proceed as follows. The median of 
0(log(l/(5)) copies of the mean of 0(l/e‘*) copies of gives [1] 

a good approximation || ||^ = (1 ± e^)||A*^-l’^A)||2 ||A(^A,±)||2 

probability at least 1 — <5. This suffices to find the larger of || |P and 
II IP, which suffices to find the /c’th most significant bit of the label of 

the overwhelming TPW coefficient in if A^^'> has an overwhelming TPW 

coefficient. This suffices to find the labels of significant TPW coefficients in A. 
Once we have found ip, we can optimize ||A — cipW^ for c, since we need to 
optimize the median of 0(log(l/<5)) quadratics in c; the optimum occurs at a 
minimum of one of the quadratics or at the intersection of a pair of quadratics [5] . 

The resulting output approximates the energy of the top k TPW terms for 
A, i.e., the sum of the squares of these TPW terms is least 1 -I- e of that of the k 
with the largest squared sum. Now we can revisit the idealized algorithm in the 
previous section, and argue that with this approximation, the arguments still go 
through and one gets Theorem 1. These details lie in the framework in [5] and 
are omitted here. 

3.2 Static Data 

We now turn to static data. We show how to find the significant TPW terms, 
then use the above techniques. 

Suppose the per-item time in the dynamic case is some T <{B \og{N) / 
where we assume T is a power of 2. We project A onto two orthogonal subspaces, 
X and Y. The space X has a basis of TPWs ip = ip\®ip 2 such that |supp(V'i)| < 
T; y is the complementary subspace, so that Y has a basis of TPWs ip\ 0 ip 2 
such that |supp(^i)| > T. It follows that an alternate basis for Y consists of the 
characteristic functions on each rectangle I x {j}, where / is a dyadic interval 
of size at least T and {j} is any point. Note that X has dimension T and Y has 
dimension N'^ /T. We will show how to find significant coefficients in X and in 
Y separately. 

We will read the data in the following order. 
for(i = 0; i< N/T; z++) 
for( j = 0; j < N; j++) 

for( k = 0; k < T; k++) 

Read( A{iT + k,j)) ; 
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We regard the inner loop as reading a vector of T components of A, that we 
treat as a unit, called a T-vector. One can project the signal onto subspace Y, on 
the fly, by summing the elements of each T-vector; this projection takes linear 
time It follows that we can use the dynamic algorithm on Y to track 

significant coefficients among the N'^/T total coefficients, with time T each, for 
total time 0{N'^), as desired. In slightly more detail, after summing a T-vector 
supported on some I x {j}, we find the 0(log(A^/T)) wavelets ip in Y such 
that supp(f/)) intersects / x {j}. The contribution to ip is 

the contribution depends on the sum of the T-vector and not otherwise on the 
constituent signal values. We regard Y as N'^ /T TPWs on an N/T x N array of 
point values, where each point value is a normalized sum over intervals I x {j} 
of the original N x N array, where / is a dyadic interval of length T. 

We now show how to find the significant coefficients on subspace X. First, 
we show how to produce a stream of all (1 — 1/T)A^ coefficients in X in linear 
time 0{N'^) and space 0{Tlog{N)). Consider a 1/^2 with support size 2. All the 
TPWs of the form ipi 0 1/^2 are entries in the vector difference of two consecutive 
T-vectors; these can be output. All of the other 0(log(A)) TPWs that involve 
these two consecutive T-vectors depend only on the vector sum of the T-vectors, 
so, after the T TPWs are output from the vector difference, we can sum the 
T-vectors, keep the sum, and discard the originals. A similar situation occurs at 
each level of the Haar wavelet hierarchy. Thus, for each value of i and j in the 
outer and middle loop, we need to store 0(log(A)) partial TPWs whose support 
includes the T-vector associated with i and j; one can maintain this information 
as i and j advance. 

Finally, we can collect the top S TPWs from the stream of all TPWs in space 
0(5') and linear time 0{N^). As an invariant, maintain a buffer of size 45 with 
between 5 and 35 TPWs. In time 0(5), fill the buffer, find {e.g., randomly) a 
buffer element with rank between 5 and 35, and discard buffer elements with 
lower rank. This restores the invariant and, overall, keeps the top 5 items. This 
concludes the algorithm description, and gives Theorem 2. 

The details we have outlined above are extensive. They involve simplifica- 
tion and extension of the calculations in [5]. In this writeup, we have focused 
mainly on the idealized algorithm so the reader gets the intuitive structure of 
the algorithm; the full proof can be obtained from authors. 

4 Concluding Remarks 

We have presented the first known algorithm to summarize a multidimensional 
data stream using polylog space, and process each update or produce histogram 
estimation in polylog time. Since powerful data stream applications such as 
IP traffic analysis generate d-dimensional data for small d, this solution is of 
great interest. However, our result is mainly theoretical, and it emphasizes the 
key ideas needed to get the theorem. In practice, these ideas have to be engi- 
neered appropriately to get effective solutions. For example, getting the signifi- 
cant TPWs can be replaced by algorithms for finding significiant rangesums in 
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each dimension and postprocessing the cross product, which will help decrease 
the per-update processing time. The second step can be omitted altogether, and 
the robust representation itself can be used to approximate data distributions. 
Other heuristics also suggest themselves, and we leave it open to explore them 
for deriving practical algorithms. 
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Abstract. Tagging schemes have been used in security protocols to 
ensure that the analysis of such protocols can work with messages of 
bounded length. When the set of nonces is bounded, this leads to decid- 
ability of secrecy. In this paper, we show that tagging schemes can be 
used to obtain decidability of secrecy even in the presence of unbound- 
edly many nonces. 



1 Background 

Security protocols are specifications of communication patterns which are in- 
tended to let agents share secrets over a public network. They are required 
to perform correctly even in the presence of malicious intruders who listen to 
the message exchanges that happen over the network and also manipulate the 
system (by blocking or forging messages, for instance). An obvious correctness 
requirement is that of secrecy: an intruder cannot read the contents of a message 
intended for others. 

The presence of intruders necessitates the use of encrypted communication. 
It has been widely acknowledged that even if perfect cryptographic tools are 
used, desired security goals may not be met, due to logical flaws in the design of 
protocols. Thus automatic verification of security protocols is an important and 
worthwhile enterprise. This is complicated by the fact that security protocols 
are in general infinite state systems. As such, it is to be expected that it is 
not possible to verify even simple properties like secrecy of such systems. It 
has been formally proved in ([7], [9], [1]) that in fact, the secrecy problem is 
undecidable. The prominent sources of undecidability are unbounded message 
length and unbounded number of nonces. 

The undecidability results seem to be at variance with the high degree of 
success achieved in verifying not just secrecy but also other more complicated 
properties of security protocols in practice. Hence there have been many attempts 
to prove decidability by imposing reasonable restrictions on the model. [7] shows 
that when both message length and the number of nonces is bounded, the secrecy 
problem is DEXPTIME-complete. [11] and [16] essentially place bounds on the 

* We thank the anonymous referees for detailed comments, which have helped to 
improve the presentation greatly, both in terms of the modelling and the proofs. 
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number of sessions that can occur in any run of the protocol, thereby obtaining 
decidability. [10] proves decidability for a syntactic subclass and our work is 
closest in spirit to this work. 

In earlier work, we separately studied the secrecy problem in the setting of 
bounded-length messages ([13]) and in the setting of boundedly many nonces 
([14]), showing decidability fo subclasses of protocols in both cases. In this pa- 
per, we prove decidability for the subclass of tagged protocols without assuming 
any external bounds. The tagging scheme ensures primarily that no two en- 
crypted subterms of distinct communications in the protocol specification are 
unifiable. Similar schemes have been used in [2] to prove the termination of their 
verification algorithm, and in [8] to prevent type-flaw attacks. 

Our decidability proof works by first tackling the problem of unbounded 
message length and then the problem of unboundedly many nonces. Message 
length can get unbounded when the intruder substitutes nonatomic terms for 
atomic terms. We show that our tagging scheme ensures that the honest agents 
do not make a criticial use of such terms for learning new information, and thus 
it suffices for verification of secrecy to consider only well-typed runs where nonces 
are instantiated only with atomic data. We next show that whenever a run of 
a tagged protocol has a send action a such that none of the succeeding receive 
actions has encrypted terms in common with a, then we can eliminate a from 
the run and perform a systematic renaming of the nonces to arrive at a run of 
shorter length which is leaky iff the original run is. We also prove that repeating 
this process yields us a run of bounded length, and show that it suffices for 
decidability. A proof outline is provided in Section 3, the details of which can be 
found in [15]. 

The technique used here should be contrasted with approaches which impose 
restrictions on the use of the tupling operator ([1], [6]), or use more stringent 
admissibility criteria like [4] which uses techniques from tree automata theory to 
show decidability for the class of protocols in which every agent copies at most 
one piece of any message it receives into any message it sends, or approaches 
like [3], where an abstraction of nonces is used to prove the termination of a 
verification algorithm for a large class of protocols. Apart from decidability, the 
model presented here has other interesting features like send-admissibility, which 
formalises a notion of reasonableness for protocols, and synth and analz proofs, 
which formalise how messages are generated and received terms are analyzed. 

2 Security Protocol Modelling 

In this section we briefly present our model of security protocols. Our modelling 
is close to the inductive approach of [12] in many respects. A more detailed 
presentation can be found in [15]. 

Actions 

We assume a finite set of agents Ag with a special intruder I £ Ag. The set of 
honest agents, denoted Ho, is defined to be Ag \ {/}. We assume an infinite set 
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of nonces N . The set of keys K is given by Kq U K\ where Kq is an infinite set 
and Ki = {kAB,P'rivk^,pubkj^ \ A, B & Ag, A =/= B}. pubk^ is ^’s public key 
and privkA is its private key. kAB is the (long-term) shared key of A and B. 

def 

For each A G Ag, Ka = {kAB,kBA,pubkA,privkA,pubkg \ B G Ag,B ^ A}. 
For k G K, k, the inverse key of k, is defined as follows: pubkA = privkA and 
privkA = pubkA fo'' all A G Ag, and fc = fc for all the other keys. The set of basic 
terms To is defined to be Tf U iV U ^ 5 . 

The set of information terms is defined to be 

T ::= m\ (^1,^2) | {t}k 

where m ranges over To and k ranges over K. 

The notion of subterm of a term is the standard one — ST(m) = {to} for 
m G %■, ST{{ti,t2)) = (ti,t2) U ST{ti) U ST{t2); and ST{{t}k) = {{tjfc} U 
ST{t) U ST{k). t' is an encrypted subterm of t if t' G ST{t) and t' is of the form 
{t"}k- EST{t) denotes the set of encrypted subterms of t. 

An action is either a send action of the form A\B\ {M)t or a receive action of 
the form AlB\t where: A G Ho,B G Ag and A ^ B-, t G T', and M C ST(t)nN. 
In a send action of the form AIB: {M)t, M is the set of nonces freshly generated 
by A just before sending t. For simplicity of notation, we write A\B-.t instead of 
A\B\ (0) t. The set of all actions is denoted by Ac. 

Note that we do not have explicit intruder actions in the model. As will be 
clear from the definition of updates caused by actions, every send action is im- 
plicitly considered to be an instantaneous receive by the intruder, and similarly, 
every receive action is considered to be an instantaneous send by the intruder. 
Thus the agent B is (merely) the intended receiver in A\B-. {M)t and the purported 
sender in AlB:t. 

For a of the form A\B:{M)t, term{a) t and NT{a) M. For a of 

the form A 7 B:t, term{a) t and NT{a) 0. NT{a) stands for new terms 
generated during action a. The notation is appropriately extended so that we can 
talk of terms{t]) and NTfq) for ry G Ac* . ST{a) and EST{a) have the obvious 
meanings, ST{term{a)) and EST{term{a)) respectively. rj\A, A’s view of rj, is 
the subsequence of rj obtained by projecting it down to the set of A-actions. 

Protocols 

Definition 21 An information state s is a tuple {sA)AeAg where sa Q T for 
each agent A. S denotes the set of all information states. For a state s, we define 
ST{s) to be y ST{sa). 

A&Ag 

Definition 22 A protocol is a pair Pr = (C,ry) where C, the set 0 / constants 
of Pr, denoted CT(Pr), is a subset of To, and p G Ac~^ is the body of Pr. 

Given a protocol Pr = (C,ry), Roles{Pr), the set 0 / roles 0 / Pr, is defined to 
be the set {r]\A \ A G Ag and ri\A ^ e}. 

In the literature, protocols are informally specified as a sequence of commu- 
nications of the form A^B:t. Such protocols can be presented in the above 




366 R. Ramanujam and S.P. Suresh 



formalism by splitting each communication into a send action and a matching 
receive action. Protocols which are presented as a finite set of roles can also be 
presented in the above formalism. 

Definition 23 Given a protocol Pr, we define the initial state of Pr, denoted 
so(Pr), to be {TA)AeAg where for all A G Ho, Ta = CT(Pr) U Ka and Tj = 

CT(Pr) U Kj U {z}, where z is a fixed nonce which is assumed to be different from 
all the nonces in CT(Pr). 

As we have mentioned earlier, we do not explicitly model intruder actions. 
Thus we do not explicitly model the phenomenon of the intruder generating new 
nonces in the course of a run, as is done in some other models (for instance, 
[7]). An alternative would be to provide an arbitrary set of nonces and keys to 
the intruder in the initial state. We follow the approach of just providing the 
intruder with the fixed nonce z in the initial state. It is a symbolic name for the 
set of new data the intruder might generate in the course of a run. This suffices 
for the analysis we perform in our proofs later. We will ensure as we develop the 
model that z is not generated as a fresh nonce by any honest agent in the course 
of a run of Pr. 

A substitution cr is a map which maps nonces to arbitrary terms, keys to 
keys and agent names to agent names. Substitutions are extended to terms, 
actions, and sequences of actions in a straightforward manner. A substitution cr 
is said to be well-typed iff for n G N, a{n) G N. A substitution a is said to be 
suitable for an action a iff it maps each distinct nonce (or key) in NT (a) to a 
distinct nonce (or key, as the case may be), and has disjoint ranges for NT{a) 
and ST{a) \ NT{a). a is suitable for oi • • • iff for all i < £, a is suitable for a^. 
cr is said to be suitable for a protocol Pr if a{t) = t for all constants t G CT(Pr). 

Given a protocol Pr, a triple (??, cr. Ip) is an event of Pr iff 77 G Roles(Pr), a is 
a substitution suitable for Pr and 77, and 1 < Ip < \r]\. Events{Pr) is the set of all 
events of Pr. An event ( 77 , a, Ip) of Pr is said to be well-typed iff a is well-typed. 

def 

For an event e = ( 77 , a, Ip) of Pr with rj = ai ■ ■ ■ ae, act{e) = a{aip). If Ip < \rj\ 
then (? 7 , cr. Ip) -Ae ( 77 , 0 -, Ip I). For two events e and e' of Pr, e' G LP{e), the 
local past of e, iff e'^^e. For any event e of Pr, NT{e) will be used to denote 
NT{act{e)) and similarly for term{e), ST{e), EST{e), etc. 

Runs of Protocols 

Definition 24 A sequent is of the form T \- t where T C T and t G T- 

An analz-proo/ (synth-proo/j tt of T \- t is an inverted tree whose nodes 
are labelled by sequents and connected by one of the ana\z-rules (synth-rules) in 
Figure 1, whose root is labelled T \- t, and whose leaves are labelled by instances 
of the Axa rule (Axg rule). For a set of terms T, analz(T) (synth(T)J is the set 
of terms t such that there is an ana\z-proof (synth-proof) ofTPt. 

For ease of notation, synth(analz(T)) is denoted by T. 



Definition 25 The notions of an action enabled at a state and update of a state 
on an action are defined as follows: 
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Fig. 1. analz and synth rules. 



— A\B\ {M)t is enabled at s iff t G sa^I M, and M fl ST{s) = 0. 

— AlB:t is enabled at s iff t GWJ. 

— update{s, A\B: {M)i) '^= s' where = s^UMU{t}, Sj = s/U{i}, and for 
all C G Ag\ {A, I}, s'q = sc- 

— update{s, A7B:t) s' where U {t} and for all C G Ag \ {A}, 

s'c = sc- 

update{s,s) = s, update{s,r] - a) = update {update (s,rj), a) - 

Given a protocol Pr, and a sequence ^ = ei • • • of events of Pr, infstate{ff) is 
defined to be update{so{Pr), oct(ei) • • • act{ek))- We say that an event e of Pr is 
enabled at £, iff LP{e) C {ei, • • • , Cfc} and e is enabled at infstate{£f). 

Definition 26 Given a protocol Pr, the set o/runs of Pr, denoted by TZ{Pr), is 
defined to be the set of all sequences e\ - - - eu of events of Pr such that for all 
i : 1 < i < k, 6i is enabled at e\ - - - ei-\. A run is said to be well-typed iff every 
event occurring in it is well-typed. 

Well-Formed Protocols 

{A\B-. {M)t, CID: t') is said to be a matching send-receive pair iS A = D, B = C, 
and t = t' . Note that we require syntactic equality of the terms t and t' rather 
than just unifiability. 

Given a protocol Pr, a sequence of actions rj = a\ - - - an is said to be send- 
admissible with respect to Pr iff for all i < £, if Oi is a send action then at is 
enabled at update {so{Pr), oi • • • Oi-i). 

Definition 27 A well-formed protocol is a protocol Pr = (C, oi&i • • • aibi) where 
{ai,bi) is a matching send-receive pair for all i : 1 < i < £ and rj is send- 
admissible with respect to Pr. 
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Well-formed protocols formalise a notion of “reasonableness” of protocol 
specifications. Almost all the standard protocols studied in the literature are 
well-formed. The following useful fact follows easily from the definition of well- 
formed protocols and from some basic properties of the synth and analz operators. 



Proposition 28 Suppose that Pr = (C,??) is a well-formed protocol. Then for 
all roles C of Pr and a suitable for C. and Pr, ( and a(() are send- admissible with 
respect to Pr. 

Tagged Protocols 

While well-formed protocols enforce a reasonableness condition at the level of 
protocol specifications, we must note that they still allow for quite unreasonable 
behaviours. Substituting encrypted terms for nonces can give the intruder the 
ability to circumvent the protocol. For instance, a communication of the form 
B :{{A, {x}b)}b in the protocol allows the intruder to capture it and send 
it on to i? as: I^B: {(/, {{(A, {x }b)}b}b)}b- This goes against the reasonable 
requirement that B expects only terms of encryption depth 2 whereas here B gets 
a term of depth 3. We thus look for mechanisms that enforce only “reasonable 
runs”. Tagging is one such mechanism that seeks to distinguish between terms 
of different encryption depth as above. More specifically, tags are just constants 
which act as message identifiers and are attached to some of the encrypted 
subterms of messages which are communicated during a run. The use of tags has 
the effect of preventing the intruder from passing off a term cr({t}fc) as <j'{{t'}k') 
in some run of a protocol while {t}k and {t'}k> are intended to be distinct terms 
in the protocol specification. We also use tagging to associate every receive action 
occurring in a run with its corresponding send (if there exists one). 

Definition 29 A well-formed protocol Pr = (C, rj) with rj = aibi ■ ■ ■ afbi is called 
a tagged protocol iff: for all t G EST{r]) there exists c* G C, and for all i < £ 
there exists nt G NT{ai) such that: 

— for all i,j < £, t € EST{ai), and t' G EST{oj) : if Ct = Cn then t = t' and 

i = j, and 

— for all i < £ and all t G EST^af), t = {(cj, (jii,u))}k for some u and k. 

Most of the standard protocols occurring in the literature (see [5] for example) 
can be easily tagged to obtain “equivalent protocols”, such that for any run f 
of the original protocol which involves only honest agents, the tagged version 
of ^ is a run of the transformed protocol, and for all runs f of the transformed 
protocol, the untagged version of ^ is a run of the original protocol. (Thus the 
transformation does not limit the honest agents’ capabilities while at the same 
time not introducing more attacks). The protocols for which this transformation 
cannot be effected are those which contain “blind copies” like the Woo-Lam 
protocol n (as presented in [5]). It is to be noted that the schemes presented 
in [8] and [2] — with reference to Definition 29, these are equivalent to using 
just the Ct tags to distinguish between distinct terms in ESTirj) — work even 
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for protocols with “blind copies” , in fact those schemes work for all well- formed 
protocols. An important point worth noting here is that including the tags in 
the protocol specification stage rather than later, in the run generation stage, 
means that the reasonableness of runs is enforced by checks performed by the 
honest participants of the protocol. 

Tagging each send action with a new nonce might seem a costly operation, 
since it is nontrivial to keep generating many distinct, unguessahle random num- 
bers. But the proofs (in particular, the proof of item 2 of Proposition 210, which 
is the only place where this property of tagged protocols is used) only require the 
fact that the Ui’s are instantiated with distinct values for distinct substitutions, 
and not the fact that they are unguessable values. Thus the n^’s are playing the 
role of sequence numbers, and are as such easy to implement. 

The following property, useful for proofs later, can be easily seen to be a 
direct consequence of the definition of tagged protocols. 

Proposition 210 Suppose Pr = (C, aibi • ■ ■ aibi) is a tagged protocol. Then the 
following statements hold: 

— For all a, a' suitable for Pr and for all i,j < ^, t € EST{ai), t' € EST{aj), 
if a{f) = cr'(t') then t = t' and i = j. 

— Suppose e\ - ■ ■ 6k is a well-typed run of Pr. For all receive events ej{j < k), 
there is at most one send event Ci such that EST{ei) fl EST{ej) ^ 0. 

The Secrecy Problem 

Definition 211 A basic term m € To is said to be secret at state s iff there 
exists A G Flo such that m € analz(s^) \ analz(s/). Given a protocol Pr and 
£, G TZ{Pr), m is said to be secret at ^ if it is secret at infstate(f) . f is leaky iff 
there exists a basic term m and a prefix of f such that m is secret at f' and 
not secret at f. The secrecy problem is the problem of determining for a given 
protocol Pr whether some run of Pr is leaky. 

Thus we say that a run is leaky if some atomic term is secret at some inter- 
mediate state of the run but is revealed to the intruder at the end of the run. 
It is possible that there are protocols for which leaks of the above form do not 
constitute a breach of security. A more general notion would be to allow the 
user to specify certain secrets which should not be leaked and check for such 
leaks. We believe that the techniques we use here can be adapted to prove the 
decidability of the more general problem as well. 

While the general secrecy problem has been proved to be undecidable in 
various settings ([7], [9]), the main result of this paper is the following decidability 
result. 

Theorem 212 The secrecy problem for tagged protocols is decidable. 

The theorem follows from a series of observations, which will be proved in the 
next section. 

1. If a tagged protocol has a leaky run, then it has a well-typed leaky run. 
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2. If a tagged protocol has a well- typed leaky run, then it has a good well- typed 
leaky run. 

3. All good well- typed runs are of bounded length. 

Properties of synth and analz 

We now state some basic properties of the synth and analz operators. The proofs 
are by a routine induction on proof trees. 

Proposition 213 Let T,T' C 'T and a he a substitution. Then the following 
properties hold: 

T C analz(T) and T C synth(T). 

ifTCT' then analz(T) C analz(T') and synth (T) C synth (T'). 
analz(analz(T)) = analz(T) and synth(synth(T)) = synth(T). 
f =analz(T) = T. 

cr(analz(T)) C analz(cr(r)) and cr(synth(T)) C synth(<T(T)). 



Definition 214 t is a minimal term ofT ift G T and t ^ synth(T\{t}). min(T) 
denotes the set of minimal terms of T. 

Suppose t is not a minimal term of T. Then from the definition it follows that 
synth (T) = synth (T \ {t}). Since T is generally used as a representative for 
synth (T), if T contains a nonminimal term then there is a smaller representa- 
tive for synth (T). Thus nonminimal terms can be viewed as redundant in such 
situations. 

Proposition 215 For any set of terms T, T C synth(min(T)), synth(T) = 
synth(min(T)) and T = synth(min(analz(T))). 

3 Decidability 

Reduction to well-typed runs: We outline the proof of Theorem 212. The 
first step is to prove that for all runs of a tagged protocol there is an equivalent 
well- typed run which preserves leakiness. Towards this, we define, for any sub- 
stitution cr, az as follows: for all x gTq, {li x G N and a{x) ^ N then az{x) = z, 
otherwise Uz{x) = <j{x)). Suppose Pr is a tagged protocol and ci • • • is a run 
of Pr where each Ci = (r]i,ai,lpj^). For every i < k, define e' = {rii,{ai)z,lpi). 
Note that e' is well-typed by definition. We prove the reduction to well-typed 
runs by showing that e'^ - ■ ■ e'f. is a run of Pr which is leaky iff ei • • • is. It 
is easy to see that the above transformation (of replacing the afs by (adz’s) 
does not affect send-admissibility, and hence all send events e' are enabled at 
e'l' ■ • e[_i- For a receive event e', we know that ti G Ti_i (where ti = term(ei) 
and Ti_i = {infstate{e\ ■ ■ ■ ei_i))/). If we show that t[ G T'i-i^ it would follow 
that e'l - ■ ■ e'j. is a run of Pr, and also that it is leaky iff ei • • • is (since replacing 
the afs with (adz’s doesn’t affect new terms generated during the actions, and 
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since the set of basic terms known to the intruder at the corresponding states in 
both the runs is the same). 

To prove that t' G T/_^ we show how to transform the proof that ti G T^_i = 
synth(min(analz(T))). This consists of a synth-proof tt of min(analz(Ti_i)) h ti, 
and an analz-proof Wt of Ti_i h t for each t labelling a leaf of tt. Note that 
every term t occurring in a leaf of tt is either a nonce or an encrypted term 
(since tuples can be synthesiized from their components and hence are not in 
min(analz(Ti_i))). Every u labelling a node of tt or one of the Wt’s is a subterm 
of tj for some j < i. Letting Vi = r]i{lpj) for each i, we see that U corresponds to 
(Ji{ri). Suppose we type the root of tt with (ai,ri). This will induce a partial typ- 
ing of 7T with types of the form {ui,w) where w G ST(ri). Suppose for all leaves 
of 7T typed (ai,w) where w G EST{n) it is shown that {ai)z{w) G analz(T/_;^). 
Then it can be easily shown that t' = (ai)z(ri) G (Some of the non-leaf 

nodes of tt might be typed (ui, m) for some nonce m. In such cases it should be 
noted that (cri)z(m) = z € Tq = Tq.) 

We now consider encrypted terms t occurring in the leaves of tt which have 
a type (ct^, w ) with w G EST{ri). If t = <7j{w') for some j < i and w' G EST{rj) 
then the tagging scheme (specifically, item 1 of Proposition 210) ensures that 
w = w' . So if we prove that {aj)z{w') G analz(T/_;^) for some j < i and some 
w' G ST(rj) such that w' is an encrypted term when t is, then it would follow 
that t' G T/_i. 

Now consider an analz-proof among the WtS. We can define the set of types 
for any node of such a proof by letting each leaf labelled by a term u be typed by 
{(cTj, Tj) \ j < i and <Jj{rj) = u}. This induces a set of types for each node. We 
say that a type (a, r) matches a term t iff cr(r) = t and r preserves the outermost 
structure of t, in particular r is an encrypted term when t is. Wt is well-typed 
if the above induced typing on vjt types its root with a type which matches t. 
We can prove that for any t G analz(Ti_i) there is a well-typed analz-proof of 
Ti_i h t. This is proved by induction on i. The main technical issue here is to 
handle the case when a non-atomic term u occurs in a node and is typed only 
by nonces. By considering various cases we show that u G Ti- 2 > thus allowing 
us to handle this by the induction hypothesis. Once we have proved this, it is 
a straightforward induction on proofs to show that {(7j)z{r) G ana\z{T'_^). This 
concludes the reduction to well-typed runs. 

Reduction to good runs: We now show that for detecting leaks it suffices to 
consider runs that satisfy a specific ‘goodness’ condition. 

Definition 31 Suppose Pr = (C, rj) is a tagged protocol and ^ = ci ■■■ Ck is a run 
of Pr. For i,j < k, Cj is called a good successor of Ci (and Ci a good predecessor 
of 6j) in £, iff: i < j and either Ci Cj, or EST^ef) fl EST{ej) yf 0. 

For i < k, Ci is called a good event in £, iff either i = k or there is some 
j > i such that ej is a good successor of Ci. Ci is called a bad event in ^ iff it is 
not a good event in f. A run ^ is called a good run iff all its events are good. A 
subsequence ei • • • of f is called a good path in f iff for all j < r, ej+i is a 
good successor of ej in 
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If €j is a good successor of then it is possible that ej strongly depends on a 
in the sense that elimination of from ^ disables e^. If is a bad predecessor 
of Cj then can be eliminated while still enabling some “renamed variant” of 

€j. 

We now prove that whenever a tagged protocol Pr has a well-typed leaky run, 
it has a good well-typed leaky run. Fix a tagged protocol Pr = (C, Oi&i • • • aibi) 
and a leaky run ^ = Ci • • • of Pr such no proper prefix of ^ is leaky. If ^ is 
a good run, we are done; otherwise, there is a bad event occurring in Let r 
be the index of the latest bad event in Let T = To H (analz(Tr) \ analz(Tr_i)) 
(where Tj = {infstate{e\ • ■ ■ Ci))/). Since Ci • • • is not leaky, there cannot be an 
m GT and r' < r such that m is secret at Ci • • • e^'. Thus T C NT{er). Let t be 
a substitution which maps every m G T to z and is identity otherwise. For all 
6i = {rji, ai, Ipi) let e' = (iji, r o a^, Ip^) where (r o ai){t) = T{ai(t)) for all t. We 
now show that C = ‘ ^ leaky; but 

the index of the latest bad event in it is less than r, and hence we can repeat 
the process, eventually obtaining a good run. 

We first take up the task of proving that is a run of Pr. We first note 
that the bad event Cr is not in the local past of any other event and also the 
substitution t does not affect the new terms generated by events other than 
Or and hence does not affect send-admissibilty. Thus all the send events of 
are still enabled by the events occurring earlier. It is only the receive events we 
have to worry about. Here again if is a receive event, then it is easy to see 
that r = 0, i.e., nothing new is learnt by the intruder because of e^, and hence 
enabledness of the other events is not affected even if is eliminated from 
Thus again is a run of Pr. The nontrivial case is when is a send event and 
Cq is a receive event for some q > r. In this case we know that tq G Tg_i (ti 
denotes term{ei)). If we show that tq G (Tg_i U T) \ {tr} we are through, since 
r(T) = {z} C Tq and hence r(tg) G r{Tq_i \ {tr})- 

We first show that tor all q : r < q < k and all analz-proofs tt whose root 
is labelled Tq G u and such that for all Tq G t labelling the non-root nodes of 
TT, t is not secret at ei • • • Cq-i, u G (analz(Tr) fl ST{tr)) U analz(T“’' U T). The 
intuition behind the proof is that if the proof has a decrypt node involving the 
terms {t}k and k then k itself is not secret at the point when {t}k is revealed 
to the intruder, and hence t is known to the intruder at the point when {t}k is 
known. Thus depending on whether tr occurs in the leftmost leaf of tt or not, 
u G analz(Tr) fl ST{tr) or u G analz((T,j UT) \ {tr})- 

Now suppose q> r and tq is a receive event. We know that tq G Tq-i. In fact 
tq G synth(analz(T,j_i) n b'T(tq)). Consider any u G analz(T,j_i) fl b'T(tq). For all 
analz-proofs tt of Tg-i F u and for all Tq-i h t labelling the non-root nodes of tt, 
(since is not leaky) t is not secret at ^q-2- Hence we can apply the result of 
the previous paragraph to this case and conclude that u G (analz(Tr) n5'T(tr)) U 
analz((Tg_i U T) \ {tr})- If u G analz((Tg_i U T) \ {tr}) we are done. Otherwise 
u G analz(Tr) fl ST(tr)- If now EST{u) yt 0 then since u G ST(tr) O ST(tq) it 
would follow that EST{tq) 0 EST{tr) yt 0 in contradiction to the fact that Cq is 
not a good successor of Cr. Thus EST{u) = 0 and u is a tuple of atomic terms. In 
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this case u £ synth(analz({u}) iT7o)- But then analzdu}) iTTo U analz(Tr) iTTo C 
analz(Tr_i U T). This implies that u € (Tg-i U T) \ {tr}. Thus we have proved 
that analz(Tg_i) fl ST(tq) C (Tg_i UT)\ {tr} and hence tq £ (Tg-i U T) \ {tr}- 
We are left with proving that is leaky. Since ^ is leaky (and Ci • • • efc_i is 
not), we can choose an analz-proof tt whose root is labelled T/c h m for some m 
which is secret at Ci • • • e^-i and such that for all Tk h t labelling the non-root 
nodes of tt, t is not secret at ci • • • Ck-i- As observed earlier we can conclude that 
m £ analz((Tfe U T) \ {tr}) U analz(Tr). Now we note that since m is secret at 
Cl • • • Ck-i, m ^ analz(Tr) (and thus m ^ T as well). Therefore m £ analz((Tfc U 
T) \ {tr}). Since m ^ T, it follows that r(m) = m. It can also be shown that m ^ 
NT{er). Thus m is secret at ci • • • Cr-iCr+i • • • efc_i as well. Therefore r(m) = m 
is secret at e{ • • • • • • e}_^. Since m £ analz((Tfc U T) \ {tr}) and since 

r(T) = {z} C To, it follows that m = r(m) G analz(r(Tfc \ {tr})). Thus is 
leaky. This proves the reduction to good runs. 

Bounding the length of good runs: We are left with proving that good runs 
are of bounded length, and further that it suffices to check a finite set of runs of 
Pr for leakiness. Suppose Pr = (C, ai6i • • • aebe). Suppose ^ is some run of Pr and 
suppose that Ci • • • Cr is a good path in The tagging scheme (specifically item 
1 of Proposition 210) ensures that there exists a sequence ii <■■■< ir < 2 ■ £ 
such that for all j < r, act{ej) is an instance of actj.(Pr), where the notation 
acti{Pr) denotes 0 (i+i )/2 if * is odd, and denotes bi /2 if * is even. Thus all good 
paths in ^ are of length at most 2 • £. From item 2 of Proposition 210 (which is 
an immediate consequence of our tagging scheme), we see that there are at most 
two good predecessors in Putting these two facts together we can see that 
any good run is of length at most 2^'^+^ — 1. Now in any run of Pr of length 
bounded by B, only a bounded number of new nonces and keys are mentioned 
(the bound depending on the specification of Pr and B), apart from CT(Pr), 
the different Ka's and z, of course. They can all be uniformly renamed using 
terms in a fixed finite set T and thus it suffices to consider runs of Pr which are 
of bounded length and which refer to basic terms from T. Since the runs are 
well-typed the width of the terms occurring in the runs are also determined by 
the specification of Pr. Thus to check if a protocol has a good well-typed leaky 
run it suffices to check in a finite set of runs and thus this problem is decidable. 

4 Discussion 

We have followed a long chain of argument to show that detecting leaks in tagged 
protocols amounts to detecting leaks in a bounded set of runs, each of whose 
length is bounded. Here we have used a fixed tagging scheme. It is conceiv- 
able that many other tagging schemes would serve equally well. This raises the 
question of deciding whether a given well-formed protocol is ‘taggable’ or not 
(preserving leaks). If the only attacks on the protocol were type flaw attacks, 
tagging may be used to eliminate them and hence this question amounts to de- 
ciding whether the given protocol has non-type-flaw attacks, assuming that it 
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has some attacks. This is an interesting issue not answered here. In ongoing work, 
we are also looking to extend the techniques used here to prove the decidability 
of security properties statable in a simple modal logic formalism. 
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Abstract. We study the complexity of quantum complexity classes like EQP, 

BQP, NQP (quantum analogs of P, BPP, and NP, respectively) using clas- 
sical complexity classes like ZPP, WPP, C=P. The contributions of this pa- 
per are threefold. First, we show that relative to an oracle, ZPP is not con- 
tained in WPP. As an immediate consequence, this implies that no relativiz- 
able proof technique can improve the best known classical upper bound for BQP 
(BQP C AWPP [16]) to BQP C WPP and the best known classical lower 
bound for EQP (P C EQP) to ZPP C EQP. Second, we extend some known 
oracle constructions involving counting and quantum complexity classes to immu- 
nity separations. Third, motivated by the fact that counting classes (like LWPP, 
AWPP, etc.) are the best known classical upper bounds on quantum complex- 
ity classes, we study properties of these counting classes. We prove that WPP is 
closed under polynomial-time truth-table reductions, while we construct an oracle 
relative to which WPP is not closed under polynomial-time Turing reductions. 

This shows that proving the equality of the similar appearing classes LWPP and 
WPP would require nonrelativizable techniques. We also prove that both AWPP 
and APP are closed under reductions, and use these closure properties 

to prove strong consequences of the following hypotheses: NQP C BQP and 
EQP = NQP. 

1 Introduction 

Quantum complexity classes like EQP, BQP [4] (quantum analogs, respectively, of P 
and BPP [17]), and NQP [1] (the quantum analog of NP) are defined using quantum 
Turing machines, the quantum analog of classical Turing machines. EQP is the class 
of languages L accepted hy a quantum Turing machine M running in polynomial time 
such that, for each x G 27*, if x G L, then the probability that M{x) accepts is 1, and 
if X ^ L, then the probability that M(x) accepts is 0. BQP is the class of languages 
L accepted by a quantum Turing machine M running in polynomial time such that, for 
each X G 27*, if X G L, then the probability that M(x) accepts is at least 2/3, and 
if X ^ L, then the probability that M(x) accepts is at most 1/3. NQP is the class of 
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languages L accepted by a quantum Turing machine M running in polynomial time such 
that, for each x & S* ,x & Lif and only if the probability that M{x) accepts is nonzero. 

Quantum complexity classes represent the computational power of quantum com- 
puters. Some fundamental computational problems — for example, factoring, discrete 
logarithm [33], Pell’s equation, and principal ideal problem [22] — are not believed to 
be in BPP (and thus, not believed to be in P), and yet are provably in BQP. One of 
the key issues in quantum complexity theory is studying the relationship between clas- 
sical and quantum complexity classes. The inclusion relationships of BQP with some 
natural classical complexity classes are known. Bernstein and Vazirani [4] show that 
BPP C BQP C P"^^.Adleman,DeMarrais,andHuang[l]improvethattoBQP C PP. 
Fortnow and Rogers [16] show that the study of counting classes can give us insights 
into the classical complexity of quantum complexity classes. In particular, they study 
the complexity of BQP using gap-definable counting classes [12]. Loosely speaking, 
gap-definable counting classes capture the power of computing via counting the gap 
(i.e., difference) between the number of accepting and rejecting paths in a nondetermin- 
istic polynomial-time Turing machine. Fortnow and Rogers prove that BQP C AWPP, 
where AWPP is a gap-definable counting class. Since AWPP C PP, they give a bet- 
ter upper bound on BQP than that of Adleman, DeMarrais, and Huang. Thus, the best 
known lower and upper bounds for BQP in terms of classical complexity classes are, 
respectively, BPP and AWPP: BPP C BQP C AWPP C PP. Similarly the best 
known classical lower and upper bounds for EQP are, respectively, P and LWPP: 
P C EQP C LWPP C AWPP C PP. 

In light of this connection, due to Fortnow and Rogers, between quantum and count- 
ing complexity classes, it is natural to ask if there are counting (or for that matter other 
classical) complexity classes that are better lower (or upper) bounds for BQP. More for- 
mally, is there a counting class C such that BPP C C C BQP? Is there a counting class 
V such that BQP C 22 C AWPP? Similarly, it is interesting to ask the corresponding 
questions for EQP. Unfortunately, resolving these inclusion relationships can be diffi- 
cult, and may be out of reach of relativizable techniques. Green and Pruim [20] construct 
an oracle relative to which EQP ^ P^^, and thus they show that proving EQP C P'’^^ 
is outside the scope of relativizable techniques. For each prime p and integer fc > 1, de 
Graaf and Valiant [10] construct an oracle relative to which EQP ^ ModptP. 

In this paper, we use counting classes to study the relativized complexity of EQP and 
BQP. In particular, we study the relativized complexity of EQP and BQP by separating 
counting classes relative to an oracle. We construct oracles A and B such that ZPP'^ 2 
WPP"^, and RP^ 2 C=P^. It follows from known inclusions that BQP"^ 2 WPP"^, 
ZPP'^ g EQP^, and BQP^ g C=P^. Note that WPP C AWPP, P C ZPP, RP C 
BQP, and EQP C LWPP C WPP C C^P. In fact, WPP is the largest known natural 
gap-definable subclass of AWPP, and ZPP is the smallest known natural probabilistic 
complexity class that contains P . We also shed light on the relationship between problems 
in C=P and those solvable by probabilistic algorithms. Even though C=P contains ZPP 
(= RP n coRP) in all relativized worlds, using relativizable techniques it is impossible 
to show that C=P contains all problems in RP. 

The separations of counting classes mentioned above, for example, RP® 2 C=P®, 
which lead to the separation of quantum complexity classes from counting complexity 
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classes for reasons mentioned above, imply for example, that relativizable techniques 
cannot prove that BQP C C^P. However, this leaves open the possibility that each set 
in BQP^ can be approximated by a set in C=P^ in the following sense: for each infinite 
set L G BQP^, there exists an infinite subset L' C L such that L' G C=P®. A strong 
(or immunity) separation of BQP^ from C=P^ will preclude this possibility. Strong 
separations have been used to study the relativized complexity of complexity classes in 
many different settings, for example, the polynomial-time hierarchy [25,7], the boolean 
hierarchy over RP [8], and counting classes [32]. We prove strong separations between 
counting classes, and from these get strong separations of quantum complexity classes 
from counting classes. For example, we show the existence of oracles A and A! such 
that RP"^ is C=P"^-immune, and BPP"^ is P^=p -immune. Using known inclusions, 
we get that BQP"^ is pC=P^ -immune. We extend the oracle separation of EQP from 
ModpfcP in [10] by constructing, for each prime p and integer fc > 1, an oracle relative 
to which EQP is ModpicP-immune. 

Results by Fortnow and Rogers [16], de Graaf and Valiant [10], and those of this 
paper, show the connection between quantum and counting complexity classes. Thus, 
it becomes important to study the properties of these counting complexity classes. In 
particular, we study the reduction closure properties of these counting classes. Fenner, 
Fortnow, and Kurtz [12] show that counting classes SPP and LWPP are closed under 
polynomial-time Turing reductions. (In fact, they prove that SPP®^^ = SPP, and 
gppLWPP _ L^pp ) xhey ask whether the same holds for WPP. We prove that WPP 
is closed under polynomial-time truth- table reductions. We also show that improving this 
result to closure under polynomial-time Turing reductions will require non-relativizable 
techniques: There is an oracle A such that pWPP'^ ^pp"^. Thus, it follows that, 
relative to oracle A, WPP strictly contains LWPP. For counting classes AWPP and 
APP, we prove a potentially stronger closure property, namely that both AWPP and 
APP are closed under (unambiguous polynomial-time Turing) reductions. 

Vyalyi [38] recently proved, using Toda’s Theorem [36], that QMA, the class of 
languages such that a “yes” answer can be verified by a 1 -round quantum interactive 
proof, is unlikely to contain PP, since if it does then PP contains PH. Using Vyalyi’s 
result and the reduction closure results mentioned above, we prove consequences of 
the “NQP C BQP” and “EQP = NQP” hypotheses. Note that these hypotheses are 
quantum counterparts of the “NP C BPP” and the “P = NP” hypotheses. Zachos [40] 
proved that if NP C BPP, then PH C BPP. We prove that if NQP C BQP, then 
PH C AWPP, from which it follows that PH is low for PP. Similarly, we prove that 
EQP = NQP PH C WPP, from which it follows that PH is low for PP. 

Due to space limitations, most of the proofs are omitted. They can be found in [34]. 

2 Preliminaries 

Our alphabet is 27 = {0, Ij.Foranyn G Nandanya; G 27*,x27" = {xw |w G 27”}. For 
any x G S* ,\x\ denotes the length of the string x, and the integer bin{x) corresponding 
to string x is defined as the value of the binary number lx. 

For general complexity-theoretic background and for the definition of complexity 
classes such as P, NP, FP etc., we refer the reader to the handbook [23]. NPTM 
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stands for “nondeterministic polynomial-time Turing machine” and DPTM stands for 
“deterministic polynomial-time Tnring machine.” Thronghont this paper, for any (non- 
deterministic or deterministic or quantum) machine N, and for any x G E*, v/e use 
N{x) as a shorthand for “the computation of N on input x.” Given an oracle NPTM 
N and a set A, we use (x) (respectively, #rej^A (cc)) to denote the number of 

accepting (respectively, rejecting) paths of on a: with oracle A. 

Definition 1 ([12]). IfN is an NPTM, define the function gap^ : S* ^ Z as follows: 
for all X G S*, gapM^x) = fi^accNix) — fi^rejM{x). If N is an oracle NPTM then, 
for every set A, define the function gapj^A : E* -G Z as follows: for each x G E* , 
gapfqA{x) = #acc^A{x) - ^rej^Aix). 

GapP is the class of functions / such that there exists an NPTM N such that / = gap n ■ 
We define the following gap-dehnable counting classes [12]. 

Definition 2. 1. [9,24,2] For each k > 2, Mod^P = {L \ {3g G GapP)(Va; G 
E*)[x G L g(x) ^ 0 (mod /c)]}. 

2. [30,18] = Mod2P. 

3. [29,12] SPP = {L I {3g G GapP)(Vx G r*)[(x G L ^ g{x) = 1) A (x ^ 
L g{x) = 0)]}. 

4. [12[ENPP = {L \ {3g G GapP)(3h G FP : 0 ^ range(/i))(Vx G N'*)[(x G 

L g{x) = h(Ol^l)) /\ {x L g{x) = 0)]}. 

5. [12] WPP = {L \ (3g G GapP)(3/i G FP : 0 ^ range(/i))(Vx G N’*)[(x G 

L g{x) = h{x)) A (x ^ L ^ g{x) = 0)]}. 

The counting classes AWPP [13] and APP [27] were defined to study the sets that 
are low for PP. 

Definition 3 ([13,27]). For every L C E*, 

1. L is in AWPP if for every polynomial r(.) > 0, there exist a g G GapP and a 

polynomial p such that, for all X G E*, x G L (1 — < 1, 

andxih 0< ^ <2-^(l"l). 

2. Lisin APP if for every polynomial rf) > 0, there exist g,h G GapP, 0 f. range(h), 

such that, for all x G E* , x G L (1 — ^ < F and 

xiL ^ 0 < < 2-’-(N). 

For background information on quantum complexity theory and for the dehnition 
of quantum Turing machine, we recommend [28,21]. We now dehne the quantum com- 
plexity classes that will be used in this paper. 

Definition 4 ([4,1]). EQP {respectively, BQP, NQP) is the set of all languages L C 
E* such that there is a polynomial-time quantum Turing machine M such that, for 
each X G E* , x G L Pr[M{x) accepts] = 1 {respectively, > 2/3, ^ 0), and 
X L ==> Pr[M{x) accepts ] = 0 {respectively, < 1/3, = 0). 

The following proposition gives the known inclusion relationships among the classes 
defined above. 
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Proposition 1 ([12,17,26,13,11,16,14,39] ). P C ZPP C RP C BPP C AWPP; 
P C UP C FewP C SPP C LWPP C WPP C C^P C PP; ZPP C coRP C 
coNP C C=P; WPP C AWPP C APP C PP; P C EQP C LWPP; EQP C 
BQP C AWPP; SPP C ©P; FewP C NP C coC=P = NQP; 

3 Separation Results 

One way to study the power of quantum complexity classes is to lower bound the 
complexity of these classes with well known complexity classes, for example NP. The 
best known lower bound for EQP is P. In fact, EQP is not known to contain even a 
single problem that is not already known to be in P. Bennett et al. [3] show that relative 
to a random oracle, NP is not contained in EQP with probability one, and, relative to a 
permutation oracle chosen uniformly at random, NP (T coNP is not contained in EQP 
with probability one. Thus, it is interesting to ask the following questions. Are there 
natural classes between P and NP (T coNP that are contained in EQP? Are there natural 
classes between P and NP iT coNP that are not contained in EQP in some relativized 
world? We prove that the latter is true by showing that there is a relativized world where 
ZPP is not contained in EQP. In fact, we prove a slightly stronger statement. We prove, 
as the next theorem, that there is an oracle relative to which ZPP is not contained in 
WPP, a superclass of EQP [16]. It is interesting to note that there is an oracle, due to 
Fortnow [15], relative to which SPP, a subclass of WPP, strictly contains an infinite 
polynomial-time hierarchy. In contrast, our oracle provides a completely different picture 
of WPP in a relativized world: a world in which WPP sets are not powerful enough to 
capture a seemingly small subclass, ZPP, of NP. 

Theorem 1. (3A) [ZPP^ ^ WPP'^]. 

Corollary 1. There exists an oracle A C S* such that, for each C S {ZPP, RP, BPP, 
NP, BQP, C=P n coC=P, AWPP, APP}, and each V G (UP, FewP, SPP, EQP, 
LWPP, WPP}, g V^. 

Note that Corollary 1 shows that proving that error- free quantum polynomial-time (EQP) 
algorithms exist for all languages in ZPP will require nonrelativizable techniques. Corol- 
lary 1 also shows that, using relativizable techniques, we cannot lower the best known 
classical upper bound for BQP from AWPP to even WPP, the largest known natural 
gap-definable subclass of AWPP. In the light of this result, it is interesting to seek a 
different classical upper bound for BQP. That is, it is interesting to ask which other 
counting classes upper bound the complexity of BQP. One such counting class is C=P. 
Note that it is not known whether AWPP C C=P or C=P C AWPP, though both these 
classes contain WPP. Regardless of what the inclusion relationship is between C=P 
and AWPP, it is conceivable for BQP to be subset of C=P. However, we show that 
proving the containment BQP C C=P is beyond the reach of relativizable techniques. 
This result is a corollary of Theorem 2. 

Tarui [35] used a lower bound technique in decision trees for a certain AC° function 
to show that BPP is not contained in pC=P ju some relativized world. Green [19] used 
circuit lower bound techniques to obtain the same result. In contrast with BPP, RP is 
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contained in pC=P every relativized world. We construct an oracle relative to which 
RP is not contained in C^P. This result is optimal in the sense that the largest known 
natural subclass of RP, ZPP, is contained in C^P in every relativized world. This oracle 
separation of RP from C=P is also a strengthening of the oracle separation of NP from 
C=P byToran[37]. 

Theorem 2. (3A) [RP^ ^ C=P^]. 

Corollary 2. There exists an oracle A C S* such that, for each C G {BPP, NP, BQP, 
AWPP, APP}, 2 C^P'^. 

4 Immunity Separation Results 

In Section 3, we saw that relativizable techniques cannot prove that BQP C C=P. But 
can we at least prove that in every relativized world, every BQP set can be approximated 
(in some sense) by a set from C=P? For example, can we prove that every infinite set in 
BQP has an infinite subset that is in C^P? In this section, we prove that in many cases, 
proving such approximability results is beyond the reach of relativizable techniques. 
Loosely speaking, class C is said to strongly separate from V if there exists an oracle 
relative to which there exists a set S' in C that cannot even be approximated (in the 
sense mentioned above) by any set in T>. The set S is said to be 2?-immune. Immunity 
separations have been used, for example, by Ko [25], and Bruschi [7] to study the 
nature of the polynomial-time hierarchy, by Bruschi, Joseph, and Young [8] for strongly 
separating the boolean hierarchy over RP, and by Rothe [32] for studying the complexity 
of counting classes. 

Definition 5. Let C be a class of languages. An infinite language L is called C -immune 
if {MV G C) [||L'|| = oo ^ L' nl 0]. 

Given relativizable classes C\ and C 2 , an oracle E strongly separates C|" from Cf if 
there exists an infinite language L G C 2 which is Cf -immune. 

M. de Graaf and P. Valiant [10] prove that, for any prime p and integer fc > 1, there 
exists an oracle A' such that EQP"^ f- ModpfeP"^ . In Theorem 3, we strengthen this 
result by proving that there is a relativized world where EQP strongly separates from 
ModpfcP. To prove that the test language we use in the proof of this strong separation 
result (Theorem 3) is in (relativized) EQP, we make use of the observation by Boyer et 
al. [6] that quantum database searching can be done in polynomial time with certainty 
if the number of solutions is exactly one fourth of the total search-space. 

Theorem 3. For every prime p and integer k>\, there exist an oracle A and an infinite 
set La such that La G EQP'^ and La is M.odpkP^ -immune. 



Corollary 3. There exists an oracle A such that, for each C G {EQP, BQP, LWPP, 
WPP, AWPP, APP, C=P, PP} and for each V G (UP, FewP, SPP}, is immune 
toV^. 
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Theorem 2 separates RP from C=P, and as a corollary we get a separation of BQP from 
C=P. In Theorem 4, we prove that, relative to an oracle, RP strongly separates from 
C=P, which in turn implies that BQP strongly separates from C=P. We use a sufficient 
condition hy Bovet, Crescenzi, and Silvestri [5] for lifting simple separations between 
complexity classes to immunity separations. 

Theorem 4. There exists an oracle A such that RP"^ contains a -immune set. 

Tarui [35] and Green [19] independently showed that BPP separates from pC=P some 
relativized world. In Theorem 5 we extend oracle separation of BPP from P*^=p to a 
strong separation result. From this it follows that, relative to an oracle, BQP strongly 
separates from P®=^. 

Theorem 5. There exists an oracle A such that for every complexity class C G {BPP, 
BQP, IJ 2 n II 2 , AWPP, APP, PP}, contains a pC=P -immune set. 

5 Closure and Collapse Results 

We have seen that the study of counting complexity classes like WPP, C=P etc. can 
give us useful insights into the classical complexity of quantum classes. In this section, 
we further study properties of these counting classes, and use these properties to prove 
consequences of the following hypothesis: NQP C BQP. Note that this hypothesis is 
the quantum analog of the “NP C BPP” hypothesis. Zachos [40] proved that NP f- 
BPP unless the entire polynomial-time hierarchy is contained in BPP, and thus it is 
unlikely that NP C BPP. In this section, we prove as Corollary 4 a strong consequence 
for NQP C BQP: NQP C BQP PP^^ = PP. We prove this implication 

by showing a reduction closure property of AWPP, and then using a recent result 
by Vyalyi [38]. Recently, Vyalyi [38] showed that if QMA, the quantum analog of 
the Merlin-Arthur class MA, equals PP then the entire polynomial-time hierarchy is 
contained in PP. In his proof, he implicitly proves, using Toda’s [36] theorem, that PH 
is contained in UP in every relativized world. 

Theorem 6 ([38]). PH C UP®=^. 

In Theorem 7 we show that both AWPP and APP are closed under (unam- 

biguous polynomial-time Turing) reductions. From this closure property of AWPP and 
Theorem 6, we conclude that if NQP C BQP then PH is low for PP. 

Theorem 7. (a) C AWPP. (6) UP^^^ C APP. 



Corollary 4. /fNQP C BQP then PH is low for PP. 

Proof. This follows from Theorem 6 and Theorem 7(a), and the facts that NQP = 
coC=P [14,39], BQP C AWPP [16], and AWPP is low for PP. | 
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Theorem 7 shows that AWPP is closed under reductions. What about WPP? 
Closure properties of WPP are also interesting in light of the results due to Fenner, 
Fortnow, and Kurtz [12]. Fenner et al. study the closure of gap-definable classes under 
polynomial-time Turing reductions. They show that LWPP and SPP are closed under 
polynomial-time Turing reductions. However, they leave open the corresponding prob- 
lem for WPP: Is WPP closed under polynomial-time Turing reductions? Theorem 9 
gives a negative answer in a suitable relativized world. Since LWPP is (robustly, i.e., for 
all oracles) closed under polynomial-time Turing reductions [12], it follows that we have 
also an oracle separating the seemingly similar classes WPP and LWPP. In Theorem 8, 
we show that WPP is closed under the weaker polynomial-time truth-table reduction, 
while its closure under potentially stronger reduction is contained in coC=P. The 
later result along with Theorem 6 allows us to conclude that if EQP contains NQP, then 
the entire polynomial-time hierarchy is contained in WPP. 

Theorem 8. (a) WPP is closed under polynomial-time truth-table reductions. (6) 
UPWPP C coC=P. 

Corollary 5. /fNQP C EQP then PH C WPP. 

Proof. This follows from Theorem 6 and Theorem 8(b), and the facts that NQP = 
coC=P [14,39], EQP C LWPP [16], and LWPP C WPP [12]. | 

We now prove that there is an oracle relative to which WPP is not closed under 
polynomial-time Turing reductions. Before we state and prove Theorem 9, we state 
a lemma that will be needed in the proof of this result. 

Lemma 1 ([31]). For every n > 17, the number of primes less than or equal ton, 7r(n), 
satisfies 

n/lnn < 7r(n) < 1.25506 n/ In n. 

Theorem 9. (3A)[pWPP^ ^ WPP^]. 

Proof. For any w € S*, let pos{w) = bin{w) — -t- 1 represent the lexicographic 

position of w among strings of length For every set A C S*,wG E*, and n G N, 
we define “Wifcounf”, “Promise” and “Boundary” as follows. 

Witcount(A, w) = ||{a; G E* \ \x\ = \w\ A wx G A}||, 

Promise(A,n) = (Vw G 27”)[Witcount(A, ru) = 0 V Witcount(A, lu) = pos{w)]A 
(Vwi, W 2 G E^)[pos{wi) < pos{w 2 ) A Witcount(A, wf) 0 ^ 
Witcount(A, wi) 0], and 

Boundary(Gl, n) = max{pos(w) | Iml = n A Witcount(H, w) 0}. 

For every sef AQ E* , define La as follows. 

La = {0" I Boundary (A, n) = 1 (mod 2)}. 

Clearly, if A satisfies Promise (A, n) at each length n, then La is in pWPP (using binary 
search along the strings w with |w| = n). We construct an oracle A such that, for each 
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n, Promise(A, n) is true, and La ^ WPP^. Let (TVg, Ms,Ps)s>i be an enumeration 
of all triples such that is a nondeterministic polynomial-time oracle Turing machine, 
Mg is a deterministic polynomial-time oracle transducer, pg is a polynomial, and the 
running time of both Ng and Mg is bounded by ps regardless of the oracle. We assume 
that the computation paths of an oracle machine include the answers from the oracle. 
Given NPTM N, x G S* , and a computation path p G S*, we let sign(A^, x, p) = -fl 
(respectively, — 1) if p is an accepting (respectively, rejecting) computation path in N [x) . 
We need the following technical lemmas. 

Lemma 2. Let N,p G where 1 < p < iV/2. Let s(j/i, . . . , dn) be a multilinear 
polynomial with rational coefficients, where each monomial has exactly p — 1 different 
variables. Suppose that for some val G Q, it holds that s{yi , . . . , pn) = valfor every 
j/i, . . . , j/AT G {0) 1} with Vi = P- Then each monomial in s(j/i, . . . , pn) has the 
same rational coefficient, i.e., 

s{pi,p 2 ,- ■ ■ ,Pn) = i.val/p)-y,^p,.^---p,^_^. 

l<ix<i2<---<ip-\<N 

Lemma 3. Let G N and p be a prime with p < N/2. Let s{pi , . . . , pn) be a 
multilinear polynomial of total degree < p with integer coefficients. If for some val G Z, 
it holds that 

1. s(0, ... , 0) = 0, and 

2. s{pi,... ,pn) = val, for every Pi,... ,pn G {0,1} with = Py 

then p I val. 

The oracle A is constructed in stages. In stage s, the membership in A of strings of 
length 2ug is decided, and the initial segment Ag-i is extended to Ag. Our choice of Ug 
guarantees that the oracle extension in stage s does not affect the computation in earlier 
stages. Set Aq := 0 and no := 17. 

Stage s where s > 1 : Let Ug be large enough so that the previous stages are not affected 
and 2"® > 4-nfpg {ug ) . We diagonalize against nondeterministic polynomial-time oracle 
Turing machine Ng and deterministic polynomial-time oracle transducer Mg. Let val 
be the value computed by Without loss of generality, we assume that 

val 0. LetT = {w G 27^”® | queries w}. 

(*) Choose a set i3, i? C T n satisfying Promise(i3, n^) such that the following 
holds: 



Boundary (B, Us) = 1 (mod 2) A pap a„_iub(0”°) val, or 

Boundary ( Ug) = 0 (mod 2) A pap^A„_iuB (0"°) 0. 

Let Ag := Ag-i U B. Clearly, the construction guarantees that La ^ WPP'^. The 
feasibility of the construction follows from the following claim. 
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Claim. For each s > 1, there exists an oracle extension B satisfying (it). 

Proof. Suppose that in stage s no set B satisfying (*) exists. Then, for every B C 
T n satisfying Promise(i3, rig), the following hold. 

Boundary(i3, ris) = 1 (mod 2) (/ap^A„_iuB (0"'*) = val, and (1) 

Boundary(i3, ris) = 0 (mod 2) gap = 0. (2) 

Let U = {w € I pos{w) is prime, and < pos{w) < | • 2"'*“^}. Fix an 

arbitrary w G U. Choose a set C Tfl 27^”'’ satisfying (a) Promise(Cuj, rig), and (b) 
Boundary (Cii,, Ug) = pos{w) — 1. Such a set Ci,, always exists because 2"'* —pg{jig) > 
I • 2”'’“^. Statements (1) and (2) in particular imply that, for all D.^, C T n wS"‘, it 
holds that 

Witcount(Du,, w) = 0 (/ap a^_iuc„ub„( 0"'’) = 0, and (3) 

s 

Witcount(Z?uj, ty) = pos(w) pap (O”") = val. (4) 

Henceforth, we use N to denote \\T fl w27”'’ ||. Let xi, . . . , xn be the lexicographic 
enumeration of the strings in T fl . We dehne Sw to be the function {0, 1}^ — Z 
that has the following property. For all C T fl , 

Sw{xd„{xi),xdA^ 2 ), ■ ■ ■ ,Xd^{xn)) = gap (0”^). (5) 

We will show that can be represented by a multilinear polynomial having low total 
degree. For arbitrary Zi, . . . , zat G {0, 1}, we call a computation path p of Ns'\qP‘) 
“{zi , . . . , Ziv) -allowable” if, along p, all queries q G ^g-i U Cu, have a “yes” answer, 
all queries q ^ Ag - 1 U Cw U (T fl w27"'* ) have a “no” answer, all queries Xi with Zi = 1 
are answered “yes”, and all queries Xi with Zi = 0 are answered “no”. Let zi, . . . ,zn G 
{0, 1}, and p be a (zi, . . . , zat) - allowable path of wi'^(0"'*). Let , . . . ,Xi^, where 
£ < pg{n.s) < jrif. < 2"®“^, be the distinct queries to strings in T fl wS'^‘ along 
p. Create a monomial mono(p) that is the product of terms Xk, k = 1,2,... where 
7fc = Vik if = 1, and 7 fe = (1 - otherwise. Let 

sign(7Vg,0”7p) •mono(p). 

zi ,... 2 iv G{0,1} p:p is {zi , 2 iv)-allowable 

It is easy to see that the thus constructed multilinear polynomial s'^{yi,... ,iin) 
coincides with Sw on {0,1}^, and has total degree < pg{rig) < 2"'’“^/ng < 
pos{w) < N/2. Statements (3) and (4) imply that for all zi, . . . , zat G {0, 1} such 
that X)i=i = pos(w), 

s„(zi, . . . , zn) = val, and s„(0, 0, . . . , 0) = 0. 

It follows from Lemma 3 that pos(w) | val. Therefore, for each w G U, pos{w) \ val. 
Hence, 

val > pos{w) > 211^11 > 

wGU 
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where the fourth inequality follows from Lemma 1 and the fifth inequality follows 
because, 2”'’ > 4n^pg{ns). However, val < because the running time of 

is bounded by Psirig). Thus, for each s > 1, Ag_i can always be extended 
in stage s. | (Claim and Theorem 9) 

Corollary 6. (3H)[WPP^ g LWPP^]. 
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Abstract. We investigate the greedy algorithm for the shortest common 
superstring problem. For a restricted class of orders in which strings are 
merged, we show that the length of the greedy superstring is upper- 
bounded by the sum of the length of an optimal superstring and an 
optimal cycle cover. Thus in this restricted setting we verify the well 
known conjecture, that the performance ratio of the greedy algorithm is 
within a factor of two of the optimum and actually extend the conjecture 
considerably. 

We achieve this by systematically combining known conditional inequal- 
ities about overlaps, period- and string-lengths, with a new familiy of 
string inequalities. It can be shown that conventional systems of con- 
ditional inequalities, including the Monge inequalities are insufficient to 
obtain our result. 



1 Introduction 

We investigate the problem of finding a shortest common superstring: 

Given a set S = {si, . . . , s„} of strings, determine a string 
of minimum length which contains each Si as a substring. 

(Obviously, we may assume, that no string in S contains another string in S' as 
a substring.) The shortest common superstring problem models the sequence as- 
sembly problem in shotgun sequencing, a fundamental problem in bioinformatics. 
Each string in the set S represents one of the sequenced DNA fragments created 
by shotgun sequencing and the assembly problem is to deduce the original DNA 
string from its set S of sequenced fragments. 

Blum et al. [1] show that the shortest commom superstring problem is APX- 
complete, hence polynomial time approximation schemes are not to be expected. 

A simple greedy algorithm is the basis of the currently best approximation 
algorithms. The greedy algorithm repeatedly merges the two strings with maxi- 
mum overlap until only one string remains. The concept of an overlap is defined 
as follows. 

* Partially supported by DFG grant SCHN 503/2-1 
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Definition 1. Let a and b be two strings, {a, b) denotes the longest proper suffix 
of a which is also a proper prefix ofb. The overlap of a and b, denoted by (a, b), 
is the length of{a,b). 



GREEDY ALGORITHM 

1. INPUT: A set S' of n strings 

2. while |S| > 1 

a) choose a,b € S such that a ^ b and (a, b) is maximal 

b) let c be the string obtained by concatinating a and the suffix of 6, that 
is not part of (a, b) /* note that c is a superstring of a and b */ 

c) let S := (S U {c}) \ {a, 6} 

3. OUTPUT: The one string left in S /* since we obtain a superstring in every 

merge in step 2(6) we output a superstring of S. */ 

If we allow to close cycles by removing the constraint a ^ bin step 2(a) and insert 
the period of the closed cycle (see definition 3) into a cycle cover rather than 
inserting c into S, then we obtain the cyclic greedy algorithm which determines 
a cycle cover of minimum length [1]. 

The length of the greedy superstring in relation to the length of the optimal 
superstring - the performance ratio - has been subject of a large body of research. 
The following example shows, that the ratio is at least 2: Let x = c{abffi,y = 
(ba)^ and z = {abffid. The nonzero overlaps are (x, y) = 2n — 1, (y, z) = 2n — 1 
and (x, z) = 2n. Thus GREEDY first joins x and z obtaining c{abffid as a new 
string that has zero overlap with y in both directions. Hence GREEDY delivers 
the superstring c{ab)'^ d{ba)^ of length 4n + 2. Obviously the solution c(a6)”+^d 
of length 2n + 4 is better and the length ratio approaches 2. 

Blum et al. [1] have shown that the greedy algorithm provides a 4-approxima- 
tion which was the first constant factor approximation achieved for the problem. 
Up to this day no further improvements on the bounds of the greedy algorithm 
were made. It is widely conjectured and the main subject of our paper, that the 
performance ratio of the greedy algorithm is in fact 2 [2,1,4]. 

Also in [1] a modified version of the greedy algorithm achieving a 3-approxi- 
mation is introduced. The current world record is due to Sweedyk [4] who obtains 
a 2i-approximation. 

We extend the Greedy conjecture to cycle covers as follows. 

Definition 2. Let si, . . . , s„ be strings. The length L*{C) of a given cycle C = 

i^ii ^ ‘ defined as L (G) Gz^+i) (^ifcGn)- 

A cycle cover C of si, ..., Sn decomposes the set of strings into disjoint cycles 
Cl, . . . ,Cr. The length of C is defined as L*(C) = X)i=i L*(Ci). 

Without loss of generality we will assume, that GREEDY picks the set 
of pairs {(si, S 2 ), . . . , (s„_i, s„)}. Observe that L{GREEDY) = I®*! ~ 

^”T;^^(si, Si+i) is the length of the greedy superstring. If we merge the last 
string Sn with the first string si, then we obtain a cyclic string of length 
L*{GREEDY) = L{GREEDY) - (s„,si). 
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We compare the length L* (GREEDY) of the cyclic superstring delivered by 
GREEDY with any two cycle covers Ci , C2 which expand in the following sense: 
for any proper subset S' of S, C\ and C 2 do not both contain a cycle cover of S' . 

We will have to restrict our attention to those inputs for which GREEDY 
determines a linear greedy order, i.e. GREEDY starts with a pair (sj,Si_|_i) and 
at any time, when strings si, . . . ,Sk are already connected, either picks (s;-i, si) 
or {sk, Sfc+i). Observe that there are 2"“^ linear greedy orders out of a total of 
(n — 1)!. As all greedy orders for three strings are linear, the previous example 
also shows a performance ratio of at least two for linear greedy orders. 

Theorem 1 . If GREEDY determines a linear greedy order, then 

L* (GREEDY) < L*(Ci) + L*(C2), 

where C\ and C2 are two arbitrary, expanding cycle covers. 

As the expansion property is always fulfilled if one of the cycle covers consists 
of a single cycle, we can, as a consequence of Theorem 1, compare the length of 
the greedy superstring with the length of an optimal superstring. 

Corollary 1 . If GREEDY determines a linear greedy order, then 
Zz(GREEDY) Y Optstring Optcyclej 

where optstring is the length of the shortest superstring and optcycie is the length 
of the shortest cycle cover. 

Thus we show that the difference between greedy and optimal string length is 
bounded by the length of an optimal cycle cover and obtain an extension of 
the original greedy conjecture, since the length of the shortest cycle cover will 
in general be considerably smaller than the length of the shortest superstring. 
Although we are not aware of any counterexample for the general case, the proof 
only applies to the restricted case of linear greedy orders. 

Our argument proceeds as follows. We represent a cycle cover C as a union 
of disjoint cycles on the vertices Si, . . . , s„. If if denotes the corresponding set of 
edges, then L*(C) = X)r=i l®*l ~ X)(s t)g£;(s,t) where we abuse notation as (s,t) 
denotes the overlap as well as the edge. To show Theorem 1 it suffices to verify 
the inequality 



n n— 1 

(Si) Sj+i) + (s„, Si), (1) 

{s,t)eEi {s,t)eE 2 *=1 i=l 

where Ei is the set of edges of Ci. 

To verify (1) we apply a sequence of conditional linear inequalities to the 
left hand side to obtain the right hand side. We construct such a sequence by 
means of a game played on an n x n board of overlaps. Initially we place one 
mouse on each position (i,j) corresponding to an edge (si,Sj) € EiU E2. Our 
goal is to move the mice to their holes, namely their final positions (f, i), {i, 1) 
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and (n, 1) corresponding to the string lengths |si| and to the overlaps (si,Si+i) 
and (s„,si). An initial setup is shown in Fig. 1. Moves of this game comprise 
for instance the well known Monge inequality and the triple move, one of our 
contributions in this paper. (The formal definition of the game can be found in 
section 3.) 



(si> Sl) 

[T^ 


(si, S2) 
1 1 


(si, S3) 

• • 


(si, S4) 


(S2, -Si) 

• 


(S2, S2) 

H 


(S 2 , S3) 
1 1 


(S2, S4) 

• 


(«3, Si) 

• 


(S 3 , S2) 

• 


(S 3 , S3) 

H 


(S 3 , S4) 
1 1 


(S 4 , Sl) 
1 1 


(« 4 , S2) 

• 


(S 4 , S3) 


(S4, S4) 

[Ml* 



Fig. 1. An example setup as it would arise for the two cycle covers Ci = {(1, 3), (2, 4)} 
and C 2 = {(1, 3, 2), (4)}. The expanding property is reflected by the fact that for any set 
I C {1, 2, 3, 4} there is at least one mouse in (/ x ({1, 2, 3, 4} \ 7) U (({1, 2, 3, 4} \ I) x 7) 



A diagonal has two interpretations, first as the length of the string |si| and 
second as the self overlap (si,Si). In the first (resp. second) case we say that 
a mouse strongly (resp. weakly) occupies the hole. (In the above example the 
mouse on (54,54) weakly occupies the cell). When applying a move we have to 
guarantee that the total sum of overlaps and string lengths represented by the 
mice doesn’t decrease. A winning sequence of moves thus corresponds to deriving 
the intended bound (1). 

We present the crucial triple move in section 2 where we also collect further 
string properties. The argument for Theorem 1 completes in section 3 by a 
careful sequencing of the moves. Conclusions and open problems are presented 
in section 4. Due to space limitations we do not prove that Monge moves alone 
are insufficient. This fact and a proof of Corollary 1 can be found in [5]. 

2 Strings and Moves 

Definition 3. Let a, b and p be strings. We say that a is p-periodic or has 
period p, iff a is a substring of for some A: G IM. If Pa is a shortest string, so 
that a is Pa-periodic, then Pa is a minimum period of a. 

When GREEDY chooses to merge two strings a and b by picking the pair 
(a, &) and x is an arbitrary different string, the options to pick (a,x),{x,b) or 
(5, a) are thereby eliminated. As GREEDY picks the maximum overlap over all 
possible choices, we know that the offdiagonal cell (a, 6) represents a value at 
least as big as the value of every cell who’s pair was still available at the time. We 
assign a rank of n — i to the cell that was picked in th ith iteration of GREEDY 
and to every cell thereby eliminated. 
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2.1 Insertions 

Lemma 1. Let a, b be strings and let Pa be a minimum period of a. 

(a,b) < |a| A {a,b) < \b\ (2) 

(a,b) < (si,Si+i) z/rank(a, 6) < rank(s*, Sj+i) (3) 

(a,a) = \a\-\pa\ (4) 

As an immediate consequence we may move a mouse from an arbitrary po- 
sition to any greedy hole that is ranked at least as high, and we may move a 
mouse from a cell (i,j) to the zth or jth diagonal, strongly filling that hole. We 
call these moves Greedy-Insertions respectively Diagonal-Insertions, the special 
case reflecting (4) is called discarding a period. 

2.2 The Monge Inequalities 

The next fundamental inequality was observed first by Gaspard Monge [3] in a 
related context. For the sake of completeness we have included the proof in [5]. 

Lemma 2. Let a,b,c,d be strings. Then the inequality 

(a, c) -I- (&, a) < |a| -I- (6, c). (5) 

holds. Furthermore, if {a,b) > {a,d) and (a,b) > (c,b), then also 

{a,d) {c,b) < {a,b) {c,d). (6) 

Inequality (5) represents a Diagonal Monge involving two mice which move 
from (a,c) and (b,a) to (6, c) and (a, a), the later position being strongly filled. 
The interpretation of inequality (6) is analogous. We call the corresponding move 
a Greedy Monge to reflect, that the required dominances have to be justified by 
the partial order of GREEDY. See Fig. 2 for an illustration of (5) and (6) as 
moves in the mice game. 




Diagonal (5) Greedy (6) 



Fig. 2. Moves based on the Monge inequality 
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2.3 The Triple 

Lemma 3. Let a, b, c, d and x he strings with 

max{(a, x), (x, 5)} > (a, b), (x, d), (c, x). 



Then 

(a, h) + {x, d) + (c, x) < (a, x) + (x, b) + (c, d) + \px\- 

Proof. We proceed by induction on (x,x). For (x,x) = 0 we have Px = x. The 
Monge inequality gives us \x\ + (c, d) > (c, x) + (x, d). As we know that (a, h) is 
smaller than the maximum of (x, b) and (a, x), we have verified our claim. 

Now assume that the claim is shown for all x with (x, x) < k. Let (x, x) = k+1 
and assume that the premise of the lemma is fulfilled. 

First we eliminate the case max{(a, x), (x, &)} > (x,x). We assume w.l.o.g 
that (a,x) > (x, 6). As (a, x) dominates (a,b) we obtain 

(x, x) + (a, 6) < (a, x) + (x, 6) (7) 

with the Monge inequality. The Monge inequality also gives us 

(c, x) + (x, d) < |x| + (c, d). (8) 

Adding (7) and (8) and subtracting (x,x) on both sides yields our claim. 

Now we may assume (x, x) > (a, x), (x, b) and since every overlap (a, b), (x, d) 
and (c, x) is dominated by (a, x) or (x, b) we know that (x, x) also dominates 
these strings. With this knowledge we may conclude: (a, x) = (a, (x, x)), (x, b) = 
((x, x), 6),(c, x) = (c, (x,x)) and (x, d) = ((x,x),d). 

Assuming the premise of the lemma we can infer that the larger of (a, (x, x)) 
and ((x,x),6) dominates (a, 6), ((x, x), d) and (c, (x,x)). Hence we can use the 
induction hypothesis to obtain 

(a,b) + ((x,x)),d) + (c, (x,x)) < (a, (x,x)) + ((x,x),6) + (c,d) + \p{x,x)\ 
which is equivalent to 

(a, b) + (x, d) + (c, x) < (a, x) + (x, 6) + (c, d) + I- 

Now we only need to observe that \p(x,x) \ < \Px\, which clearly holds as (x,x) is 
a substring of x. □ 

Thus we may move mice according to the diagrams in Fig. 3(a) and 3(b). 

3 Winning Strategies 



Before defining our algorithm we introduce notation for subboards. (See Fig. 
3(c)). 
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(a) vertical 
Triple 



(b) horizontal (c) Illustration for Defini- 

Triple tion 4 



Fig. 3. 



Definition 4. 1. The subboard Boardij (with i < j) is the set of all cells in 

the intersection of rows i thru j and columns i thru j. 

2. Let B = Boardij. The horizontal [vertical] frame of B, is the set of all cells 
belonging to rows [columns] i thru j but not to B. The frame of B is the 
union of the horizontal and vertical frame of B. 

3. Let B = Boardij. We define Gi{B) to be the greedy cell in position (j, j + 1), 
if existing, and G 2 {B) to be the greedy cell in position {i — l,i), if existing. 
We further define the neighbouring diagonal cells, N{Gi{B)) := (j + 1, j + 1) 
and N{G 2 {B)) := (i — l,i — 1), if existing. 

Definition 5. We call a move of two mice rectangular, if they are initially on 
two opposite corners (a, b) and (c, d) of a rectangle and move to the other two 
corners (c, b) and (a, d) . 

A diagonal or greedy monge move is rectangular. A triple move can be repre- 
sented as two rectangular moves even though they are not individually valid. 

Lemma 4. Assume that every row and every column of the board contains two 
mice. A rectangular move can only empty the frame of a subboard B, if one of the 
participating mice leaves the horizontal frame, the other leaves the vertical frame, 
and exactly one of them enters B. Furthermore the frame of B is nonempty if 
and only if the horizontal and the vertical frame are nonempty. 

Proof. Obvious. □ 

3.1 The Rank Descending Algorithm 

We will describe the Rank Descending Algorithm RDA which is defined on a 
board for a linear greedy order. Our RDA will systematically grow a subboard 
B beginning with a single diagonal until the subboard coincides with the entire 
board. The name RDA is assigned, as we move mice of higher rank into the holes 
first. The algorithm is defined as follows. 
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The Rank Descending Algorithm 

(1) The input consists of two cycle covers Ci,C 2 and a linear greedy sequence. 

(2) Preprocessing 

(2a) Place a mouse on position (i,j), if string Sj immediatly follows Si in Ci 
or in C 2 . If Sj is the immediate successor of Si in both C\ and C 2 , then 
(z, j) receives two mice. If a mouse is placed on a diagonal (i, i) it weakly 
occupies (z,z). 

(2b) Let i be the row of the highest ranked greedy cell. Set B = Boardi^i 
(2c) If (z, z) contains a mouse, then discard its period. Otherwise execute a 
diagonal monge in (z,z). 

(3) Main Loop. 

RDA will establish Theorem 1. In the context of Corollary 1 we are given 
a superstring and a cycle cover which translate into 2rz — 1 mice. We insert an 
additional mouse on the intersection of the row containing a single mouse and 
the column containing a single mouse, thus closing the superstring to obtain a 
second cycle cover. An additional postprocessing step which determines a win- 
ning sequence without the additional mouse introduced is described in [5]. We 
show that the main loop preserves the following invariants: 

11 Every row and every column intersecting B contains 2 mice. 

12 Every hole in B is filled. 

13 No diagonal outside of B is strongly occupied. 

14 No diagonal contains more than one mouse. 

15 For all subboards B' with B Q B' ^ Boardi,n the frame of B' is not empty. 

Lemma 5. The invariants II , . . . ,/5 hold before the main loop starts. 

Proof. 11,13 and /4 are obvious and 12 is valid after the first move (2c). 15 
holds since we demand the expansion property. □ 

We call a Diagonal Monge, Greedy Monge, Triple, Discard of Period legal, if it 
doesn’t violate any of the invariants listed above. We are now ready to introduce 
the main loop of our algorithm. 



while B / Hoardi.n 


1 r.et G be the higher ranked of the two greedy cells Ci (W) and G^iB) 


Do 

yes 


^s G contain a mouse? 

no 


Docs N{G) conta 

yes 


in a mouse? 

no 


Docs N 

no 


[G) contain a mouse? 

yes 


Discarding a 
Period in N{G) 




ObE.3 

Greedy- Monge 
in G 


Is the Triple in G, N (G 

no 


) legal? 

yes 


ObE.2 

Greedy-Monge in G 


Obs. 1 

Triple 


Oba. 3 

Diagonal-mongc in N{G) 


Discarding a 
Period in N{G) 


Let B be the smallest subboard that contmns the old B and G 
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For the following arguments, we call a mouse free, if it is not on a diagonal 
cell, not in a greedy hole belonging to B and not on G. 

Now we need to argue, why the moves described exist and that they maintain 
our invariants. As we only discard periods, apply Monges or triple moves, all of 
which leave the number of mice per row and column unchanged, invariant II 
is automatically satisfied. Since we grow the subboard i? by a diagonal only, if 
the two new holes are filled, invariant 12 follows, once we show that the claimed 
moves exist. 13 is valid since if a diagonal {si,Si) is filled indirectly, that is not 
by a Monge in (sj, Si), then it is only weakly occupied. 

The existence of the moves described in the algorithm as well as invariants 
14 and 15 will be shown in a case analysis. (Within the table we have noted 
the corresponding observations for each case of our main loop). We assume 
without loss of generality that G\{B) is ranked higher than G 2 {B), and we set 
G = Gi{B). In the following we state three lemata which show the legality of 
the moves performed, provided invariants II thru 75 hold before the move. 




Fig. 4. Illustration for Observation 1 



Observation 1. Assume that G does not contain a mouse and that N{G) is 
weakly filled. (See Fig. 4)- Then the triple move in G and N{G) can he executed. 

Proof. Observe that there is no free mouse in the bottom row of B: assume a free 
mouse sits somewhere in the bottom row of 73. In this case every row restricted 
to B contains two mice and thus the horizontal frame of B is empty in violation 
with 15. 

Note that G dominates every cell of its row except those in 73. G also dom- 
inates the cells of the row of N{G) that might hold the free mouse of this row 
and the free mouse of G’s column. Thus the conditions for the triple move are 
fulfilled. □ 

Since the main loop executes a triple only if it is legal, the invariants will be 
maintained after executing a triple. 

The following observation is used throughout: if a subboard contains (a, b) 
then it also contains (a, a) and (&, b). 
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Observation 2. Assume that G does not contain a mouse, but N{G) weakly 
contains a mouse. If the triple in G and N{G) is illegal, then the greedy monge 
in G is legal. 

Proof. In the given setup there is exactly one free mouse in G’s column and one 
free mouse in G’s row. Observe that a triple move is only illegal if it violates 14 
(overloading a diagonal) or violates 15 (clearing a frame). 

Case 1: The triple move overloads a diagonal. (See Fig. 5(a)). 

By case assumption the diagonal w is filled by a mouse. Observe that the 
mice X and y cannot be in the same column since otherwise their column contains 
three mice. Hence the greedy monge move can’t overload a diagonal as well. 

We verify 15. Assume that B' containing B exists, whose frame gets cleared. 
As a consequence of Lemma 4, neither z nor x belongs to B' . In particular the 
diagonal w doesn’t belong to B' either. If y belongs to B' , then w as the diagonal 
in its column would belong to B' as well. 

We have assumed, that the greedy monge in G violates 15 for B' . In partic- 
ular, due to lemma 4, G and hence N{G) belongs to B' and thus y belongs to 
the horizontal frame of B' . Since y is not moved, 15 remains valid. 

Case 2: The triple move empties the frame of a subboard B' containing B. 

We regard the triple move as two consecutive rectangular moves - the first 
involving the mice on x and N(G) - the second involving the mice from y and 
z. As to the first move, it is obvious, that it can’t clear a frame, since a mouse 
on a diagonal does not belong to any frame. 

So we assume that the second move clears the frame of B' . Hence cells y 
as well as z do not belong to B' . The mouse from x must be in B' , since B is 
a subboard of B' and otherwise x would remain in the frame of B' . Thus x is 
located closer to B than y (with respect to column distance) and we get Fig. 
5(b). 




Fig. 5. Proof of Observation 2 

Why is /4 maintained for the greedy move? If the greedy monge fills a di- 
agonal indirectly, then this diagonal has to belong to B' since x belongs to B' . 
Hence z belongs to B' which, as we have seen, it does not. 
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Assume that invariant /5 is violated by the greedy monge for a subboard B” 
containing B. G and hence N{G) belong to B” , since both mice on x and z leave 
the frame and one enters B” . Consequently y must belong to B” since it may 
not be in its frame. But then x belongs to B” as well since it has to be located 
closer to i? as we have seen above. Therefore its frame is not cleared. □ 



Observation 3. Assume that P is a greedy or a diagonal cell such that there 
are exactly two free mice in the row of P and one free mouse in the column of P 
which are all dominated by P. Then there exists a monge in P that is legal. The 
same statement holds if the column of P contains two free mice and the row of 
P contains one free mouse. 

Proof. We restrict ourselves to the case of one free mouse in the column of P 
and two free mice in the row of P. 

If the two free mice are on the same cell, no frame can be cleared (since 
one of the mice remains fixed) and there can’t be an occupied diagonal in their 
column. So this case yields no problems. 

If the monge with the inner of the two mice doesn’t overload a diagonal, the 
outer mouse stays in every horizontal frame that the inner one might leave. So 
we can concentrate on the case with the inner of the two monges overloading a 
diagonal. In this case the “outer” monge can’t overload a diagonal as well. So 
74 is valid for the outer monge. 




Fig. 6. Illustration for Observation 3 

Assume the frame of B' containing B gets cleared by the outer monge in- 
volving y and z. Then x must belong to B' and hence the occupied diagonal 
w does. But then z belongs to B' and hence the frame is not cleared (see Fig. 
6 ). □ 

Lemma 6. Invariants II thru 15 hold after the main loop. 

Proof. We assume that the invariants hold before the main loop performs a 
move. If this move consists in discarding a period, then the invariants obviously 
remain valid. We have shown in Observation 1 that the triple move as required 
is executable and legal by assumption. In Observation 2 we have seen that the 
greedy monge is legal provided the triple move is illegal. The greedy monge (resp. 
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the diagonal monge) assuming that N{G) does not contain a mouse is covered 
in Observation 3: the greedy monge in G will have two free mice in the column 
of G and one in its row. As for the diagonal monge in N{G) observe that there 
are two free mice in N(G)'s row and one free mouse in its column. □ 

4 Conclusion and Open Problems 

We have introduced the triple move and have thus shown, that conditional string 
inequalities are sufficient to settle the greedy superstring conjecture for linear 
greedy orders. The conjecture is implied by the stronger statement of Theorem 
1, and it may be, that this stronger version turns out to be easier to show. 

Of course the verification of the superstring conjecture for all greedy orders is 
the open problem. We think that the approach of conditional string inequalities, 
possibly augmented by other strategies, offers a promising line of attack. The 
Rank Descending Algorithm, due to our invariants, can at all times exploit that 
one of the neighbouring diagonals is not yet strongly filled. Such a guarantee 
cannot be made in the general case. 

We may evaluate a given set of linear string inequalities by determining 
the largest approximation ratio. Linear programming will deliver matrices of 
pseudo overlaps and pseudo string lenghts fulfilling the given set of linear string 
inequalities. If an approximation ratio greater than 2 is achieved, then the hope 
is, that a new conditional linear inequality explains why a string counterexample 
cannot be constructed. The triple move was found in that manner. 



Acknowledgements. Many thanks to Uli Laube for carefully implementing 

the mice game allowing us to recognize several deadends ahead of time. 
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Abstract. The logic of Counters, Lambdas, and Uninterpreted func- 
tions (CLU) is a subset of first-order logic satisfying the twin properties 
that 1) the validity of a CLU formula can be decided by generating a 
Boolean formula and using a Boolean satisfiability (SAT) checker to show 
the formula is unsatisfiable, and 2) it has sufficient expressive power to 
construct models of a variety of interesting software and hardware sys- 
tems. We describe this logic and show its modeling capabilities. 



1 Introduction 

In many areas of computer science, we would like to formally verify systems that 
are best modeled as infinite state systems. Examples include: 

— Programs operating on integer data. Even though most programs are exe- 
cuted using a fixed word size approximation of integers, the ranges of values 
are so large that it is often best to consider them unbounded. 

— System designs containing arbitrarily large storage elements such as mem- 
ories and queues. Within this context, we can model the system as either 
having truly unbounded storage, or where the storage is of bounded size, but 
the bound is arbitrarily large. 

— Concurrent systems consisting of an arbitrarily large number of processes. 
We can consider the case where there are an infinite number of processes, or 
where the bound on the process count is arbitrarily large. 

Most of the commonly used formal verification tools, such as BDD and SAT- 
based model checkers [5,7], require a finite state model of the system. Thus, they 
must use finite word sizes for representing data, and they must assume a fixed 
configuration in terms of the buffer sizes and number of processes. The verifier 
can prove properties about this particular system configuration, but there could 
be bugs in the design that occur only in some other configuration, such as when 
more processes are added but the buffers are not also enlarged. Some work has 
been done to verify systems with many identical processes using a finite-state 
abstraction [19,17], but these approaches apply only to systems satisfying special 
structures and symmetries. 



P.K. Pandya and J. Radhakrishnan (Eds.): FSTTCS 2003, LNCS 2914, pp. 399-407, 2003. 
© Springer- Verlag Berlin Heidelberg 2003 




400 R.E. Bryant 



Verifying infinite-state systems requires using a more expressive logic than 
the Boolean logic traditionally used by automated verification tools. Of course, 
formal verification using theorem proving has long been based on first-order 
and higher-order logics, but these tools typically require considerable manual 
effort and expertise by the user. Our interest is in obtaining levels of automation 
similar to those seen with traditional model checking tools, while also getting 
the expressive power of first-order logic. 

We have identified a subset of first-order logic we call “CLU,” short for 
“Counters, Lambdas, and Uninterpreted functions.” We have found this logic 
surprisingly powerful in its ability to model different types of systems. To decide 
the validity of a CLU formula, we can transform it into an equivalence-preserving 
Boolean formula and then use a SAT checker to perform the actual validity test. 
Given the greatly increased capacities of modern SAT solvers [16], this approach 
has enabled us to handle decision problems that far exceed the capabilities of 
more classical theorem provers. Our UCLID verifier [3] allows the user to describe 
a system in a modeling language based on CLU and to apply different forms of 
verification to the model. 

In this paper, we give a brief overview of CLU, how it can be used to 
model infinite-state systems, and the different forms of verification supported 
by UCLID. 



bool-expr ::= true | false | -'bool.-e.T.pr\ {bool-expr bool-expr) \ (bool-exprV bool-expr) 

I {ini-expr=int-expr) \ {int-e.xpr<int-expr) 

I pi-alicatf;-cxpr(int-cxpr , .... mt-cxpr) 
int-expr ::= int-var\ lTE{bool-expr, int-expr, int-expr) 

I Bucc(mt-e:i7w) | pred(mt-(a.'yw) | f^inciitm-Kxjn'{ini-<iX!in\ . . . , 
predicaie-expr ::= predicate- symbol \ A int-var , . . . , int-var . bool-expr 
fnncMon-expr ::= function- symbol \ A int-var, . . . , int-var . int-expr 

Fig. 1. CLU Expression Syntax. Expressions can denote computations of Boolean 
values, integers, or functions yielding Boolean values or integers. 



2 The Logic of Counters, Lambdas, and Uninterpreted 
Functions (CLU) 



Figure 1 shows the formal syntax of CLU. There are four data types: Booleans, 
integers, predieates mapping integers to Booleans, and functions mapping inte- 
gers to integers. The UCLID verifier also supports enumerated types, but these 
are implemented with Boolean values using bit-level encodings. A CLU expres- 
sion describes a means of computing one of these four value types. Boolean 
expressions consist of predicate applications, using either a predicate expression 
or one of the two interpreted predicates: equality and less-than between integers. 
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We also allow the usual Boolean connectives. Integer expressions consist of func- 
tion applications and increment (succ) and decrement (pred) operators applied 
to integers. We also include the if-then-else operation ITE to express condition- 
als. Both predicate and function expressions can consist of uninterpreted symbols 
or CLU expressions using lambda notation. By using uninterpreted symbols, the 
verifier can prove the correctness of a design for arbitrary choices of these func- 
tions and predicates. As a special case, predicate and function symbols of order 
0 (i.e., having no arguments) denote arbitrary Boolean and integer values. The 
formal parameters of a lambda expression must be integers, and so our logic 
lacks the expressive power, as well as the complexity, of a true lambda calculus. 

CLU does not include any quantifiers. Our decision procedure will declare 
a CLU formula to be valid if and only if it evaluates to true for all possible 
interpretations of the predicate and function symbols, and thus we can view 
these symbols as implicitly being universally quantified. 



3 Modeling Examples 

In this section, we describe implementations of several important data structures 
using CLU. In constructing a system model, we view its different components 
as abstract data types, using techniques such as those shown here to construct 
the component models. Even if the actual system implements the components 
using different structures, we can guarantee the soundness of the verification by 
using the most abstract and general component models satisfying the component 
specifications. 



3.1 Arrays 

Many verifiers support some form of one-dimensional array [6,15], where an 
integer index selects a particular array entry for reading or writing. Arrays can 
be used to model random-access memories, as well as the state elements in a 
system with many identical processes. 

We model an array as a function mapping an integer index to an integer 
array element. At any point in the system operation, the state of an array is 
represented by some function expression M . Typically, the initial state of the 
array is given by an uninterpreted function symbol toq, indicating that the array 
has arbitrary contents. The effect of a write operation with integer expressions I 
and D denoting the position and data value being written is given by a function 
expression M': 



M' = Xj .ITE{j=I, D, M{j)) 

We can readily extend this idea to multidimensional arrays, using function 
expressions having multiple arguments. We can also express various forms of 
“parallel” updating, where multiple array elements are modified in a single step. 
For example, given a integer expression T expressing a threshold, we can express 
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the effect of incrementing all array elements having values less than T with the 
function expression: 

M' = Xj . ITE{M{j)<T, succ(MO)), M{j)) 



3.2 Queues 

A first-in, first-out queue of arbitrary length can be modeled as a record Q 
having components Q. contents, Q.head, and Q.tail. Conceptually, the contents 
of the queue are represented as some subsequence of an infinite sequence, where 
Q. contents is a function expression mapping an integer index i to the value 
of sequence element i. Q.head is an integer expression indicating the index of 
the head of the queue, i.e., the position of the oldest element in the queue. 
Q.tail is an integer expression indicating the index at which to insert the next 
element. Q is modeled as having an arbitrary state by letting Q. contents = cq, 
Q.head = ho, and Q.tail = to, where cq is an uninterpreted function and ho and 
to are symbolic constants satisfying the constraint ho < to. This constraint is 
enforced by including it in the antecedent of the formula whose validity we wish 
to check. 

Testing if the queue is empty can be expressed quite simply as: 
isEmpty{Q) = {Q.head = Q.tail) 

Using this operation we can define the following three operations on the queue: 

1. Pop{Q): The pop operation on a non-empty queue returns a new queue Q' 
with the first element removed; this is modeled by incrementing the head. 

Q' .head = ITE{isEmpty{Q), Q.head, succ{Q .head)) . 

2. Eirst{Q): This operation returns the element at the head of the queue, pro- 
vided the queue is non-empty. It is defined as Q .contents {Q .head) . 

3. Push{Q , X): Pushing data item X into Q returns a new queue Q' where 

Q' .tail = succ(Q.tail) 

Q' .contents = Xi . ITE{i= Q .tail, X, Q . contents {i)) 

Assuming we start in a state where ho < to, Q.head will never be greater than 
Q.tail because of the conditions under which we increment the head. 

A bounded-length queue can be expressed as a circular buffer. We introduce 
two additional parameters Q.minidx and Q.maxidx to indicate the range of 
allowable queue indices. The push and pop operations are modified to wrap the 
head and tail pointers back to minidx when they are incremented to maxidx. 
By using symbolic values zq and i\ for these two bounds, we can model systems 
where the queue is of bounded size, but the bound is an arbitrary integer. Note 
that we require two bounds, since our logic does not have any way to denote the 
integer value 0. 
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3.3 Sets 

A set of integers S can be modeled simply using a predicate expression indicating 
whether or not a given value is in the set. In many contexts, however, we would 
also like to be able to enumerate the set elements. To do this, we model a set 
as a record S with components S.list, S.pos, S.minidx, and S.maxidx. The 
set elements are represented by a list, which itself is viewed a function S.list 
mapping integers ranging between S.minidx and S .maxidx — 1 to the list values. 
The function expression S .pos maps each possible set element to its position in 
the list or to pred(5'.mzm(ia;) if the element is not in the set. 

The following illustrates how set operations are expressed using this repre- 
sentation 

1. An empty set is denoted by the expressions S.list = Iq, S.pos = A y.pred(zo), 
S.minidx = S .maxidx = to, where Iq is an uninterpreted function (the con- 
tents are irrelevant since the list is empty), and zq is an uninterpreted symbol. 

2. Testing membership. Member{S ,x) is expressed as -•(5'.pos(a;) =pred(zo)). 

3. Inserting an element, assuming it is not already a member. Insert{S, x) yields 
a set S' having components 

S' .list = Xi . ITE{i = S .maxidx, x, S .list{i)) 

S'. pos = Xy . ITE{x = y, S.maxidx, S.pos{y)) 

S' .minidx = S.minidx 
S' .maxidx = succ(5'.maa;z(ia;) 

4. Removing an element, assuming it is a member. Delete{S , x) follows a similar 
pattern as for insertion, but we decrement S .maxidx, and we insert the list 
element that was at the final position into the position of the element we are 
deleting: 



S' .list = Xi . ITE{i = S .pos{x) , S .list{'pved{S .maxidx)) , S.list{i)) 
S' .pos = Xy . ITE{x = y, pred(zo), S.pos{y)) 

S' .minidx = S.minidx 
S' .maxidx = pred(5'.maa;z(ia;) 



3.4 Event Schedulers 

The UCLID verifier performs a symbolic simulation, where on each step of system 
operation, every state element gets updated according to a next-state expression. 
That is, it uses a synchronous concurrency model. In many applications, however, 
we want to model the system as reacting to a series of events that occur in some 
arbitrary order. We can express such a system by constructing a nondeterministic 
event scheduler. 

The details of such a scheduler depend on the particular system being mod- 
eled. The key idea is to use uninterpreted functions to generate arbitrary value 
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sequences to represent a sequence of process IDs for the scheduler or a sequence 
of nondeterministic choices. This construction was referred to as a “generator of 
arbitrary values” in [20]. This generator requires a counter ctr as a state element, 
having initial state cq (an uninterpreted symbol). It is updated on each step us- 
ing the successor operation. From this, we can generate an arbitrary sequence 
using an uninterpreted function V, where V{ctr) denotes the value generated on 
each step. Similarly, with an uninterpreted predicate symbol P we can generate 
an arbitrary sequence of truth values with the expression P(ctr). 

4 Verification Techniques 

The UCLID verifier supports several forms of verification. The actual operation 
of the program is expressed using a command language that allows the system 
state to be initialized, the system to be operated for a series of steps, and for 
the states to be captured and compared. A system state is represented as a set 
of CLU expressions (one for each state element). 

5 Symbolic Simulation 

Symbolic simulation involves modeling the operation of the system over a series 
of steps and then checking whether the state values match some expected values 
or satisfy a set of desired properties. Since the states are encoded as CLU expres- 
sions, these checks generally involve calling the decision procedure to determine 
whether the properties hold for all interpretations of the function and predicate 
symbols. 

We have found that many errors in our initial system models can be found 
by simply running bounded property checking, where we operate the system 
starting from an initial system state over a fixed number of steps, checking 
desired properties at each step. 

As shown by Burch and Dill [6], symbolic simulation can also be used to prove 
conformance between a pipelined microprocessor and its sequential specification. 
It involves running two symbolic simulation runs and then comparing them for 
equality. Typically, this form of verification can only be performed when the 
pipeline can be flushed in a bounded number of steps. 

6 Invariant Checking 

A more general form of verification is to prove that some property P holds at 
all times using a form of proof by induction. That is, we prove that P holds for 
the initial state, and that if it holds for some arbitrary state S then it also holds 
for any successor state S'. This approach is commonly used when verifying a 
system using a theorem prover [10]. 

For most systems, we require quantifiers to express the invariant properties 
[11]. That is, the invariant is an expression of the form \/XP{X, S) for sys- 
tem state S. Verification requires proving that the formula [VAP(X, 5)] 
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\iXP{X, 5")] holds. We can express this as a formula of the form 
VX[3y-iP(F, S)y P{X, 5")]. The universally quantified symbols X pose no prob- 
lem, since our validity checker implicitly provides universal quantification, but 
the existentially quantified symbols Y cannot be expressed in CLU. We handle 
this by using quantifier instantiation, in which the expression P{Y, S) is rewrit- 
ten using many different expressions for the symbols in Y , and these expressions 
are disjuncted. The different expressions are selected from the subexpressions of 
P{X,S'). This approach is sound, but incomplete. 

Using this approach, we have been able to verify some very complex out-of- 
order microprocessor designs [12]. 



7 Invariant Generation 

Checking invariants is a powerful tool for reasoning about systems, but it requires 
the user to carefully craft a set of induction hypotheses that capture key features 
of the system and that allow the desired system properties to be proved. This 
requires an intricate understanding of the system and a lot of trial and error. 

We have been able to greatly simplify this form of verification by developing 
tools that automatically generate the invariant properties. Our approach is based 
on predicate abstraction [9]. With predicate abstraction, the user supplies a set 
of primitive predicates (typically comparison and equality expressions without 
any Boolean connectives) describing simple properties of the system state. These 
predicates then form a finite-state abstraction of the original system, consider- 
ing the possible valuations of all of the predicate expressions. Using a form of 
symbolic reachability analysis, the predicate abstraction tool can construct a 
formula giving the most general property that holds invariantly for the system. 
Due to the restrictions of CLU logic, we have been able to implement predicate 
abstraction with efficient symbolic algorithms [13]. 

Using predicate abstraction, we have been able to verify safety properties for 
an out-of-order execution unit, a distributed cache protocol [8], and Lamport’s 
bakery algorithm [14]. 



8 Conclusion 

In developing verifiers for infinite-state systems, most researchers have used more 
powerful forms of logic than ours, including features such as linear arithmetic 
[1], Presburger arithmetic [4], general first-order logic [18], and even higher-order 
logic [10]. We have adopted a more conservative approach, adding features to the 
logic only when necessary to express more general system models. We started 
with a logic including only uninterpreted functions with equality [2], but then 
added counters and comparison operators to model ordered structures such as 
queues. Using lambda notation to express updates of function state elements 
required a minimal extension over the conventional support for arrays, but it 
gave us the ability to express a wide range of operations on data structures. We 
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have found that CLU is capable of expressing a surprisingly wide range of data 
structures and system types. 

By keeping the logic restricted, we maintain the ability to perform validity 
checking by a translation to Boolean formulas. This provides a decision procedure 
with sufficient capacity that we can add additional capabilities such as quantifier 
instantiation and predicate abstraction, both of which lead to very large CLU 
formulas. 
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Abstract. In [4] a new relativization notion — stringent relativization 
— has been introduced for investigating a hne relationship between com- 
plexity classes. But considering “stringent relativization” is meaningful 
even for more general relationships such as P vs. NP. In this talk we 
explain the motivation of this relativization notion, examine some basic 
facts, and discuss known results and related open problems. 



1 Definitions 

We start with definitions for notions concerning “stringent relativization” . Ex- 
planation on its meaning and motivation will be given in the next section. The 
definitions we give below are from [5], which are slightly different from those in 

[4]. 

We assume that the reader is familiar with basic notions and notations in 
complexity theory; see, e.g., [2,7] for them. As usual, we assume that strings are 
binary sequences in {0,1}*, a problem is specified by a language, i.e., a set of 
strings, and a complexity class is defined by a class of languages recognized by 
some machinery (machine or circuit) of a certain type within a certain resource 
bound. Since we consider relativized computation, we assume such machineries 
can make queries to a given oracle. In the following, we will bound query length 
by using some length bound, which is simply a function from N to N. We assume 
that length bounds are all “reasonable” such as time/space constructible in each 
context . 

Definition 1. Let M he any query machine (or circuit family with query gates) 
of any type. For any length hound i{n), we say M is executed under £(n)-length 
restricted relativization (relative to an oracle set X ) if any query of length > £{n) 
gets a ‘no’ answer (while any query of length < £{n) gets an answer consistent 
with X) on the computation of M of any input of length n. 
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Roughly speaking, for stringent relativizations, we propose to compare two 
machines Mi and M2 executed under length restricted relativization, and dis- 
cuss, for example, whether Mi simulates M2 relative to some oracle X under 
length restricted relativization. 

While two machines can be compared in this way, some care is needed in 
order to obtain a reasonable definition to compare two complexity classes. Here 
we propose the following definition for a basic “stringent relativization” notion. 
(We will use symbol G to denote an oracle for stringent relativization.) 

Definition 2. Let Ci and C2 he any eomplexity classes represented by relativiz- 
able query machines, and let £{n) he any length bound. For any oracle set G, we 
say that containment Ci C C2 is shown by £(n)-length stringent relativization 
relative to G (or, in short, Cf C C® under £(n)-length stringent oracle access^ 
if for any query machine Mi for Ci, there exists some query machine M2 for 
C2 such that L{M2) = L{Mi) holds, when both machines are executed under 
£{n) -length restricted relativization. 

Remark. The notion for separation (i.e., Ci % C2) is defined in the same way. 

Clearly, bounding query length is vacuously true, in the above definition, if 
the length bound £{n) is larger than the bound allowed for complexity classes 
Cl and C2. Thus, one should choose an appropriate length bound. In particular, 
for relationships between polynomially resource bounded classes, we propose the 
following stringency. 

Definition 3. Let Ci and C2 be any complexity classes represented by relativiz- 
able query machines. We say that their relationship (i.e., containment or sepa- 
ration) is shown by polynomially stringent relativization if for any length hound 
£(n), l 7 +(n) < £{n) < there exists some oracle relative to which the rela- 

tionship is shown by £{n) length stringent relativization. 

Remark. 

1 . By “for any length bound £{n), l 7 +(n) < £{n) < ... holds”, we mean 

precisely that “there exists some constant c > 1 such that for any constant 
d > 0 and for any length bound £{n) such that cn < £{n) < n’^ for almost all 
n, ... holds.” 

2 . An oracle can he (in most cases “should he”) defined for each length hound 
£{n). But since each oracle for £{n) is defined similarly (in most cases), we 
will simply state it as if a single oracle exists as in, e.g., “Cf C C® by 
polynomially stringent relativization.” 

We remark some subtle points of our stringent relativization notion. 

Notice first that we consider only query length bound £{n) such that £{n) > 
cn for some c > 1 . Though we could use a bit stronger stringency by changing 
“some” to “any” , we anyway need some sort of linear lower bound for our current 
stringent relativization proofs. We will explain this point more in Section 3 . 

Note also that we compare complexity classes for each fixed query length 
bound £{n). The stringency would become much weaker if we can use different 
length bounds for two machines. For example, consider the relationship NP C 
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P. If we can use some polynomial as a query length bound for the deterministic 
machine simulating a given p(n)-time bounded nondeterministic machine, then 
the standard relativization that is explained above works by using q{n) (> p{n)) 
as a length bound. 

2 Why Stringent Relativization? 

We can give two motivations for introducing/investigating the notion of stringent 
relativization. One is for a better relativization scenario; another is for providing 
an interesting new topic in complexity theory. 

2.1 For a Better Relativization Scenario 

The notion of relativized computation was introduced [1] to demonstrate the 
hardness of proving/disproving various open questions on complexity classes. In 
particular they showed a “relativized world” where P = NP holds relative to 
some oracle set; this is taken as evidence supporting the hardness of proving 
the P yf NP conjecture. To provide convincing evidence supporting such conjec- 
tures, it would be better to design the augmented computations allowed by the 
relativization so that the comparisons are as fair as possible between the two 
classes. 

Take the P"^ = NP^ result as an example. A standard way of proving this 
collapsing relativized result is to use some PSPACE-complete set, say, QBF, as 
an oracle A. Then the result is immediate from the following relation. 

NpQBF g pspACE'^i^f = PSPACE C 

While this is a valid argument, it may not be completely satisfiable to establish 
that “there is some parallel world where P = NP indeed holds.” 

To see the problem, let us consider a bit more in detail the simulation of any 
NP query machine Nq by some P query machine relative to QBF. Assume that 
Nq is a 0(n^)-time bounded machine. Then one can easily design some PSPACE 
machine Ni simulating more precisely, for any input x of length n, Ni{x) 

simulates using 0(n^) space. Then by using the standard reduction, we 

can express the computation of Ni{x) in terms of some QBF formula Fj,. That 
is, A^i accepts x if and only if Fj, G QBF. Thus, by asking one query (namely, 
Fx) to the oracle QBF, one can simulate Ni{x) (which is {x)). Since 

is polynomial-time computable from x, this proves that is simulated by 

some deterministic polynomial-time machine Mq asking one query to QBF. Now 
compare the length of queries asked by these two machines Nq and Mq . For any 
input X of length n, let the queries that Nq makes on x be up to £ in length, which 
is bounded by 0{n^). On the other hand, Mq asks F^,, which is a QBF formula 
of length > t' so long as F^ is defined by a standard reduction to QBF. That 
is, the simulating machine Mq asks strictly longer queries than those asked by 
the simulated machine Nq. We think that this comparison is not so fair because 
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the simulating machine can use information that the simulated machine cannot 
access. 

Intuitively, for each oracle set X, the relative computation model allowing 
oracle queries to X provides a “relativized complexity world” where all compu- 
tation is the same as our real world except that one can use some special set of 
instructions, i.e., queries to the oracle X. From this point of view, the compari- 
son is not fair if the simulating machine can access some part of the oracle that 
the simulated one cannot access. 

Let us discuss this point in more detail. For our example, consider the relation 
between DSIZE[n^] and NSIZE[n^], where 

DSIZE[n^] = the class of sets recognized by 0(n^)-size circuits, and 
NSIZE[n^] = the class of sets recognized by 0(n^)-size nondet. circuits. 

Here we define circuit size to be the number of wires connecting gates. A non- 
deterministic circuit of size s is an ordinary circuit with additional m < s input 
gates. For a given input x, the circuit accepts x iff it yields 1 on a; with some m 
bits given to additional input gates. 

We would conjecture that NSIZE[n^] 2 DSIZE[n^]; yet, it would be difficult 
to prove. Thus, to discuss the hardness of proving this conjecture, a collapsing 
“relativized” result we need is 

NSIZE[n2] C DSIZE[n^]. 

That is, we would like to prove the above in some “parallel world” that is defined 
by a nonstandard model of computation. 

(Relativization Type 1) 

A nonstandard computation model can be defined by extending primitives for 
computation. For circuit model, one would naturally think of using gate types 
that can achieve more complicated computation than AND, OR, or NOT. For 
example, as our (Type 1) model, we may allow to use any gates implementing 
any Boolean functions on some fixed, say, 5, input bits. Clearly, there is no 
essential difference! The model is equivalent to the standard one. 

(Relativization Type 2) 

Then to obtain a different model, we need to allow (a family of) gates with 
unbounded number of inputs. For example, we assume that circuits can use a 
family of gates {Qe}e>o, where each Qi computes some (a priori fixed) function 
from {0, 1}^ to {0, 1}. This is the model corresponding to the standard rela- 
tivization. (Recall that circuit size is defined by the number of wires connecting 
gates; thus, any circuit of size s cannot use gates if ^ > s.) 

The important point here is that the set of gates that can be used by circuits 
varies depending on their size bounds. Hence, a set of primitives is not the same 
for designing circuits of size, e.g., O(n^) and O(n^). For example, a n^-size circuit 
simulating a nondet. n^-size circuit can make use of gates, e.g., Q„2.5, that are 
not allowed for n^-size (nondet.) circuits. Thus, simulating 0(n^)-size circuits 
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can make use of a set of primitives that is larger than the one that simulated 
0(n^)-size nondet. circuits can use. 

(Relativization Type 1^) 

To avoid the above not-so-fair comparison, let us consider an intermidiate model 
between (Type 1) and (Type 2). Our requirement is:- (i) use a family of infnite 
number of gates such as {Qi}i>o with unbounded number of inputs, and (ii) use 
the same set of primitives for simulating and simulated circuits. Then one natural 
approach is to bound I for gates Qi by some fixed function ^(n) on input size 
n. This is the model for our £(n)-stringent relativization. (Note that the query 
length bound i(rx) must be smaller than the size bounds of both simulating and 
simulated circuits. In the above example, i(ji) should be at most r? . In fact, for 
“polynomially bounded” complexity classes, we propose to use £(n) = cn for the 
query length bound.) 

Looking back over existing relativized results in standard textbooks, e.g., 
[2,7], we would notice that almost all collapsing results are proved by the ar- 
gument similar to the above P vs. NP case. We propose to reconsider these 
collapsing results under polynomially stringet relativization. 

In the literature, we can find a similar approach. Book, Long, and Selman 
[3] introduced the notion of “positive relativization” ; their motivation is similar 
to ours. For example, consider P vs. NP again. The standard oracle B for P^ yf 
NP® is constructed by making use of the fact that NP-machines can query 
exponentially many strings whereas P-machines can query only polynomially 
many strings. For our discussion, let us restrict here that the length of queries 
is n, i.e., the input length. The above standard oracle construction for B still 
works, and under this restriction, the set of primitives is the same for both 
machine models. There is, however, a big difference on the set of primitives 
that are acutally used in computation. In positive relativization, this difference 
is avoided by bounding the total number of query strings by some polynomial 
in n. 

It should be noted here that positive relativization aims for fair comparisons 
for separation and that it does not solve the not-so-fair situation in collapsing 
results such as P"^ = NP"^. To check this point, let us recall some definition. 
For any function q{n), define NP^„) to be the class of sets recogonized by some 
NP-machine relative to X, where the total number of query strings in the whole 
nondet. computation is at most q{n). Then define NP^ to be the union of NP^„) 
for all polynomial q. Classes P^„) and P^ are defined similarly. The positive 

relativization is to compare classes such as NPjf and P^. 

Notice here that bounding the total number of query strings is a restriction 
for relativized nondeterministic computations, not for deterministic ones. For 
example, Pj^ is nothing but P^, because the restriction is not essential. In other 
words, the restriction is for a relativized class NP^ not to become too strong 
when proving P^ ^ NP^ . Thus, this restriction does not work for fair collapsing 
results. In fact, we still have P^®® = NP^®®; that is, the collapse can be proved 
relative to QBF even if we restrict that machines can query at most one string. 
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The restriction to NPi only makes NP weaker, and in P one need to query QBF 
only once to get P SPACE. 

2.2 For Providing an Interesting Topic in Complexity Theory 

“Why bother with such points?! We know of only few lower bound techniques, 
and the standard relativization results are enough to convince us that these 
techniques are hopeless. After all, we all know that proving P ^ NP is hard!” 

Maybe one can take an phenomenological approach and simply concede that 
such problems are hard. However, in our view, it is worthwhile to delineate the 
reason and boundary of such difficulties. One useful perspective on this diffi- 
culty is provided by relativizations. Relativized separating/colapsing results tell 
us something on the power of different computation mechanisms. For example, 
the fact that there is an oracle B such that P'® yf NP'® shows us that the 
nondetermism is essential to extract information from a certain oracle within 
polynomial-time. 

In general, by comparing relativized classes C and T>, we could investigate 
some sort of “computatinal complexity” that these classes stands for, though 
this comparison may be independent from the real relationship between C and 
T>. In a nut shell, relativized complexity classes have provided us with interesting 
and reasonable complexity theoretic problems. In fact, several important results 
have been obtained through investigations of relativized complexity classes. For 
example, the search for relativized separation of PH and PSPACE has led to the 
investigation of constant depth boolean circuits [8] . 

Similarly we can hope that our new relativization model would provide us 
with some interesting new problems to work with. Of course, it is important 
that (i) those problems created by the new model should be “new”, and (ii) the 
problems should be “reasonable” . 

For the first point (i), the positive relativization, while well motivated and 
interesting, did not create new problems. This is because of the following relation 
proved by Book etal. in [3] by a direct simulation of NPjf-machines. 

3A [ Pjf NPjf ] 44> P y^ NP. 

We can show similar relationships for the other comparisons; hence, problems for 
positive relativization are rather direct restatements of existing problems. On the 
contrary, as shown in the following section, problems on stringent relativization 
classes seem different from any of the existing problems. 

The second point (ii) is not easy to explain. In fact, it is difficult to give 
technical criteria for “reasonable” problems. But at least we can show that our 
stringent relativization model is a natural extension/restriction of existing and 
well-studied models. 

As we have seen above, stringent relativization is a natural restriction of the 
standard relativization. Also stringent relativized computation is equivalent to 
a generalization of the nonuniform computation with advice strings proposed 
by Karp and Lipton [10]. Intuitively, any relativized result can be regarded as 
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a comparison between complexity classes under a certain nonuniform setting 
provided by an (infinite) advice, namely an oracle. Here we generalize the advice 
string formulation of Karp and Lipton by allowing random access to the advice 
string, so that advice strings longer than polynomial length become meaningful 
for polynomial-time bounded computations. Now, for example, our 2n-stringent 
relativization is equivalent to allow random access, for any input of length n, 
to an advice string of length 2^” that is provided by some a prior fixed advice 
function. 



3 Technical Discussions 

3.1 Polynomially Stringent Relativization Results 

As far as the authors know, all separation proofs by the standard relativization 
are in fact polynomially stringent. On the other hand, almost all known con- 
tainment (or equivalence) proofs are not, and these are relations that we need 
to reconsider. 

Here let us see more technically the problem of the standard oracle construc- 
tion, and explain one approach to obtain a desired containment result by the 
stringent relativization. Suppose now we want to define a standard oracle A with 
which some (inferior type) machine Mi can simulates some (superior type) ma- 
chine M 2 . What we usually do (in most of the standard oracle construction) is 
to use some segment of the oracle to encode the results of the simulated machine 
M 2 (on inputs of the current length n). Since this segment is allocated to the 
part that M 2 can access (when executed on any input of length n), we do not 
need to worry at all whether the encoding affects the results of M 2 (on length n 
inputs). But this technique is not allowed in the stringent relativization. Then 
we need to encode the results of M 2 in the part where M 2 can also see. One 
proof approach is to guarantee that there is a segment where encoding does not 
affect M 2 ’s computation. 

The following proof [11] is one example of this approach. 

Proposition 1. NP C P/poly by polynomially stringent relativization. More 
precisely, for any polynomial length hound £{n) > 3n, some oracle G can he 
constructed so that^Y’^ C P^/poly holds by £{n) -length stringent relativization. 

Proof. Consider a standard enumeration Ni,N- 2 , . . . of all NP query machines. 
We define G by stages. For each n, we define so that one can easily compute 
Nf^{x), for any machine W in the first k machines in the enumeration and for 
all input strings x of length n. 

Let us first fix some Ni and discuss how the oracle set G is defined. Let Pi 
be the polynomial time bound for Ni. Here we may assume that n is sufficiently 
large so that pi(n)2” < 2^” holds. 

Starting from 0", we examine lexicographically each input x G {0, 1}" and 
determine the membership of some length 3n strings into G. (We may assume 
that has been fixed before this stage.) Suppose that we have examined 
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strings before x; at this point, is only partially fixed. Consider all extensions 
in G^^” consistent with all the fixed strings in or out of G^^" so far, we look 
for some nondeterministic path that accepts a: if G is extended appropriately. If 
such an accepting path exists, then extend G consistent with this accepting path 
(i.e., fix the membership of some strings in {0, 1}=^” in G^^”), and proceed to 
the next input string. (Note that this incurs only at most Pi(n) many additional 
fixed positions.) Otherwise, i.e., if no consistent extension has such an accepting 
path, then Ni rejects x as long as G is extended consistent with all the fixed 
positions so far. In this case, we simply proceed to the next input string. 

Suppose that G^^" has been partially defined as above by examining all 
input strings of length n. Since the machine asks at most Pi{n) strings on each 
nondeterministic path, for each input string, we only have to fix the membership 
of at most Pi{n) strings of length 3n to force Mi to accept x (if it is indeed 
possible). Hence, at most pi(n)2" strings of length 3n are fixed after examining 
all 2" input strings of length n. Therefore, there is some string ut of length 2n 
such that no string in 1}” is fixed when defining G. We simply use this 
part to encode the information whether accepts x or not. That is, for each 

X G {0, 1}", a string mx is put into G if accepts x] no other string is added 

to define G-^”. Notice that adding these unfixed strings to G does not change 
the acceptance of on any input of length n. On the other hand, since uix 
is in G if and only if accepts x, one can easily compute N^{x) by checking 
whether UiX is in G. For this computation, it is necessary to know this key ui, 
but this information can be hardwired in a polynomial-size circuit. 

Precisely speaking, we need to simulate all k machines Ni, N^, . . . , Nj~ 
for some k, where k grows infinitely often when n increases. By an argument 
similar to the above, such simulation is possible if n is large enough so that 
Si<i<fc(P*(^) + 1)2” < 2^”. We can encode all results by using k segments, 
■ui{0, 1}”, . . . , Ufc{0, 1}”, with strings ui, . . . , Uk in {0, 1}^”. Then a simple cir- 
cuit with the key string Ui hardwired can compute N^{x) for all x G {0, 1}". 
□ 



Note that this type of oracle construction can be easily merged with oracle 
construction for separating results. For example, it is easy to extend the above 
result as follows. 

Corollary 1. We have P 2 NP P/poly by polynomially stringent relativiza- 
tion. 

As we will see in the next subsection, polynomially stringent relativization 
results are not, in general, robust under the polynomial-time reducibility. Thus, 
the above result NP^ C P'^/poly does not necessarily imply stringent relativiza- 
tions such as P^p"^ C P'^/poly, . . ., PH*^ C P'^/poly, etc. 

Higher collapsing results such as PH^ C P*^/poly are provable by the same 
approach, but with a bit more involved argument. In [5], relation PH^ C BPP*^ 
is proved by polynomially stringent relativization. We use the decision tree ver- 
sion [6] of switching lemma of Hastad [9] . The idea is to use some appropriate 
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(random) restriction to define the oracle G partially, under which computations 
of a target S))-machine are simplified to small decision trees; then, we can fix 
these computations and still keep enough room to encode the results of these 
computations. Since BPP C P/poly, this result implies PH*^ C P'^/poly by 
polynomially stringent relativization. 

Even more recently, we could improve the above result to P*^ = PH*^. Our 
approach is slightly more general. Suppose that for any given S))-machine M 2 , we 
want to define an oracle G relative to which some polynomial-time deterministic 
machine M\ can simulate M 2 under, say, 2n-length restricted relativization. 
Suppose also that G is defined up to length 2n — 1, and that our task is to define 
so that the simulation works for all inputs of length n. Then for any input 
string X of length n, when it is fixed, the computations of two query machines 
Ml and M 2 on x can be regarded as functions fi^^ and f 2 ,x on Boolean variables 
Zi,...,zn where each Zi, 1 < i < N = 2^”, denotes whether the ith string Wi 
in {0, 1}^" is put into G^" or not. That is, our task is to find an assignment to 
Zi, ..., zn such that the equation 



fl,x{zi, ■..,Zn) = f2,x{Zl,..;ZN) (1) 

holds for all x G {0, 1}". Again we use the switching lemma to show that there 
exists a partial assignment to zi, ..., Zn so that functions f 2 ,x (for all x G {0, 1}”) 
become simple ones that are computable by some small decision tree. Then the 
algebraic argument is used to show that the equation (1) holds for all x with fi^x 
being a simple parity function over some polynomial-size subset of {z\, ..., z^}- 
This yields the following result. 

Theorem 1. For any e > 0 and any polynomial length hound £(ri) > (1 -I- 
e)n, some oracle G can be constructed so that PH*^ C P*^ holds by £{n)-length 
stringent relativization. 

Once we have this collapsing result, it is easy to modify the proof to have a 
collapse to any level of the polynomial-time hierarchy. Also as mentioned above, 
it is easy to merge such an oracle construction with the known oracle construction 
[12] for separation. Then the next corollary follows. 

Corollary 2. For any d > 1, we have P 2 ’ ’ ’ 2 by 

polynomially stringent relativization. 

3.2 Relation to the Standard Relativization 

One big difference from the standard relativization is that polynomially stringent 
relativization relations are not invariant under the polynomial-time reducibility. 
We give an example for this. 

Consider the following sets, a canonical complete set for NP^ and its varia- 
tion. (Below we use N to denote nondeterministic Turing machines.) 

K1(A) = { (N,x,0^) : accepts X in t steps } , and 

K2(A) = { {N,x,0*) : accepts x in steps}. 
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Clearly, for any X, both are polynomial-time many-one reducible to Kl(Jf). In 
fact, since the identity function reduces Kl(Jf) to Kl(Jf), K1(X) is in 
even under n-length restricted relativization. On the other hand, it is easy to 
define a set T so that K2(T) cannot be in by linearly length restricted 

relativization. We simply consider the following set: 

Lt = {0" : 3y[\y\ = n^ A y G T] }. 

Then by a standard diagonalization argument, we can define T so that Lt is not 
in DTIME'^[2°(”)]. But if K2(T) were in under 0(n)-length restricted 

relativization, then Lt could be recognized deterministically in 2*^^"^ steps. 

Let P^ be the class of sets recognized in polynomial-time relative to X under, 
say, n-length restricted relativization. The above example shows that this class 
is not closed under the polynomial-time (Turing) reducibility; that is, 

pKl(T) ^ 

because the righthand class contains K2(T), which is not in the lefthand class. 
This example suggests that the following relation cannot be derived immediately, 
or it may^ not hold in general. 

Thus, even if we have P*^ = NP^ by polynomially stringent relativization, it 
may not immediately imply that P^ = P^^ by polynomially stringent rela- 
tivization. 

Intuitively, polynomially stringent relativization is a restriction of the stan- 
dard relativization. One may ask whether this is technically true, that is, whether 
any standard relativized collapsing result follows from the one proved by poly- 
nomially stringent relativization. While such a relation has not been proved, 
we can at least claim [5] that any polynomially stringent relativized collapsing 
result can be modified easily for the standard one, provided that the stringent 
oracle is constructed following the standard stage construction, like the one for 
Theorem 1. 



3.3 Some Open Questions 

We can consider the following three types of open problems. 

(1) Stringent relativized containment relations: 

One of the most important and interesting relations is P = PSPACE. Note 
that in order to prove P = PSPACE by polynomially stringent relativiza- 
tion, we will essentially have to satisfy the equation (1) for all x G {0,1}”, 
with some easy function for and any function f 2 ^x corresponding to some 

^ The authors have not been able to obtain, however, an example such that this relation 
falsifies for some reasonable classes C and T>. 
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PSPACE computation on input x. For nondet. and co-nondet. classes, the rela- 
tion NEXP = co-NEXP would be interesting. 

(2) Stringent relativization with sublinear query length bounds: 

Prove/disprove collapsing relations (or even separation ones) by £(n)-length 
stringent relativization for some sublinear length bound £{n). Note that this 
question for £{n) = O(logn) is similar to (though not the same) the one for 
complexity classes with polynomial-size advice. For example, from the fact that 
PH % DSIZE[n‘^] for any fixed d > 0, it follows that P = PH is not provable 
by clogn-length stringent relativization for any fixed c > 0. On the other hand, 
no interesting results have been proved for larger sublinear length bounds £(n) 
such as £{n) = (1 — s)n for some e > 0. 

(3) Stringent relativization for lower complexity classes: 

Although we have discussed by focusing “polynomial” complexity classes, the 
notion of stringent relativization could be also worth studying for complexity 
classes lower than “polynomial” complexity classes. 
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Abstract. We propose a framework for building deadlock- free systems 
from deadlock-free components. The framework is based on a methodol- 
ogy for the layered construction of systems by superposing three layers. 
A layer of components, an interaction model and a restriction layer. The 
interaction model specifies the possible interactions between components. 
The restriction layer restricts the behavior of the two lower layers by a 
global constraint. Layered structuring allows separating three orthogonal 
aspects in system construction. Apart from its methodological interest 
it makes technically possible the definition of a unique and powerful as- 
sociative composition operator. 

We study sufficient deadlock-freedom conditions for systems built from 
deadlock-free components and given interaction model and restriction. 
We also provide a sufficient condition for individual deadlock-freedom of 
the components of such systems. 



1 Introduction 

Deadlock-freedom is an essential correctness property as it characterizes a sys- 
tem’s ability to perform some activity over its lifetime. Deadlocks are the most 
common source of errors in systems of concurrent processes. They occur when 
processes share common resources or are in general subject to strong synchro- 
nization constraints. In that case, a process may remain blocked as long as a 
condition depending on the state of their environment is not satisfied. 

It has been often argued that deadlock-freedom is not a relevant property 
as there exist systems that never deadlock e.g. hardware, time triggered sys- 
tems [12,10], synchronous systems [4,9,2,14]. In such systems, components are 
never blocked by their environment as it is supposed that component inputs 
are always available whenever requested. Nevertheless, it is clear that if some 
strong coordination between components is needed e.g. they share a common 
resource, then this can be achieved by a deadlock- free protocol where the values 
exchanged between components are used to encode the information necessary for 
safely implementing the coordination. Thus, for systems that are by construc- 
tion deadlock- free, verifying mutual exclusion requires analysis of the sequences 
of the values exchanged between the coordinating components. 
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Another argument for not considering deadlock-freedom as a relevant prop- 
erty is that any system can become trivially deadlock-free by adding some idle 
action loop that does not modify its overall observable behavior whenever there 
is a risk of deadlock e.g. at waiting states. Such a modification allows elimi- 
nation of deadlocks but it leads to systems where it is possible to indefinitely 
postpone interaction between components by privileging idle actions. Thus in- 
stead of checking for deadlock- freedom, other properties such as livelock-freedom 
and fairness must be checked for system correctness. 

The above remarks show that building deadlock-free systems requires the 
definition of an appropriate setting where absence of deadlock means satisfaction 
of strong coordination properties. In this paper we propose a framework for 
building deadlock-free systems from deadlock-free components. The framework 
is based on a methodology for the layered construction of systems by superposing 
three layers (figure 1). 





Restriction by U 




Interaction Model 


1 1 1 


[] 0 □ 

Components 



Fig. 1. Layered system description. 



~ The bottom layer consists of a set of components. They are characterized by 
the set of actions they can perform and their behavior. The latter is specified 
as a transition system representing the effect of actions on component states. 

— The intermediate layer is an interaction model. This is used to specify the 
possible interactions between the components. An interaction is the result 
of the simultaneous occurrence (synchronization) of actions from different 
components. Furthermore, an interaction model specifies the set of incom- 
plete interactions, that is the interactions that the system cannot execute 
without synchronizing with its environment. Interaction models are a general 
mechanism for specifying various kinds of parallel composition [8]. 

— The upper layer is a restriction by a constraint (predicate) on the system 
state. Restriction is used to prevent the behavior of the underlying layers 
from executing interactions that violate the constraint. 

System behavior is obtained by successive application of the meaning of each 
layer. The behavior of components is composed as specified by the interaction 
model. Then it is restricted by application of a global constraint. We believe 
that the proposed construction methodology is general enough as it combines 
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usual parallel composition with techniques relying on the use of constraints such 
as invariants. For example, in a multitasking system, interaction models can be 
typically used to describe synchronization for mutual exclusion and restriction 
can be used to express scheduling policies [1]. 

We provide sufficient conditions for deadlock-freedom of systems built from 
deadlock-free components for given interaction model and restriction. The results 
assume that for each component a deadlock-free invariant is given, that is a set 
of states from which the component can block only because of its environment. 
The condition relates the deadlock-free invariants of the system components to 
the enabling conditions of the interactions of the layered system. 

We also provide a sufficient condition for individual deadlock-freedom of the 
components in a system. A component is individually deadlock-free if it can 
always perform an action. 

The paper is organized as follows. 

Section 2 describes the layered construction principle. It deals essentially with 
the presentation of the interaction models and their properties. Component be- 
haviors are described as simple non deterministic loops with guarded commands 
[7,16]. The concept of restriction is taken from [1]. 

Section 3 presents the result for global and individual deadlock-freedom. 
Section 4 presents concluding remarks about the presented framework. 

2 Composition 

We present a composition operation on layered systems. The operation com- 
poses each layer separately. The bottom layer of the product is the union of 
the bottom layers of the operands. The interaction model of the product is ob- 
tained as the union of the interaction models of the operands with some ’’glue” 
interaction model. That is, the composition operator is parameterised with in- 
formation about the interactions between the operands. Finally, the restriction 
of the product is the conjunction of the restrictions of the operands (figure 2). 
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Fig. 2. Layered composition. 



The two following sub-sections present the interaction models and their com- 
position. 
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2.1 Interaction Models 

Consider a finite set K of components with disjoint finite vocabularies of actions 
Ai for i G K. We put A = 

A connector c is a non empty subset of A such that Vt G A' . |Aj Cl c| < 1. 
A connector defines a maximally compatible set of interacting actions. For the 
sake of generality, our definition accepts singleton connectors. The use of the 
connector {a} in a description is interpreted as the fact that action a cannot be 
involved in interactions with other actions. 

Given a connector c, an interaction a of c is any term of the form a = oi i 
. . . I a„ such that {oi, . . . , a„} C c. As usual [15,3], we assume that i is a binary 
associative and commutative operator. It is used to denote some abstract and 
partial action composition operation. The interaction oi i . . . i a„is the result of 
the occurrence of the actions oi, . . . , a„. For a and a' interactions, we writeaia' 
to denote the interaction resulting from their composition (if its is defined) . 

Notice that if a = oi i . . . i a„ is an interaction then any term corresponding 
to a sub-set of {oi, . . . , a„} is an interaction. By analogy, we say that a' is a 
sub-interaction of a iia = a' i a" for some interaction a" . Clearly, actions are 
minimal interactions. 

The set of the interactions of a connector c = {ai,...,a„}, denoted by 
/(c), consists of all the interactions corresponding to sub-sets of c (all the sub- 
interactions of c). We extend the notation to sets of connectors. If C is a set 
of connectors then I{C) is the set of its interactions. Clearly, for Ci,C 2 sets of 
connectors, I{C\ U C 2 ) = /(Ci) U /(C 2 ). 

Definition 1 (Set of connectors). The set of connectors of a system consist- 
ing of a set of components K with disjoint action vocabularies Ai for i G K, is 
a set C such that (JceC ^ ~ Uie/f if c G C then there exists no c' G C 

and c<G c' . That is, C contains only maximal sets. 



Definition 2 (Interaction model). The interaction model of a system com- 
posed of a set of components K with a set of connectors C is a pair IM = 
{I{C),I{C)~) where I{C)~ C I{C), is the set of the incomplete interactions 
such that it contains no maximal interactions and V6, 6' G /(C), b G I{C)~ and 
b' C b implies b' G I{C)~ . We denote by /(C)+ the set of the complete (non 
incomplete) interactions. Clearly, /(C) + contains all the maximal interactions 
of I (C) and is such that V6, b' G /(C), b G /(C) + and b C b' implies b' G /(C) + . 

Notice that any action appears in some connector. The requirement that C 
contains only maximal sets ensures a bijective correspondence between the set 
of connectors C and the corresponding set of interactions /(C). Given /(C), 
the corresponding set of connectors is uniquely defined and is C. To simplify 
notation, we write IC instead of /(C). 

The distinction between complete and incomplete interactions is essential 
for building correct models. As models are built incrementally, interactions are 
obtained by composing actions. It is often necessary to express the constraint 
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that some interactions of a sub-system are not interactions of the system. This is 
typically the case for binary strict synchronization (rendez-vous) . For example, 
send and receive should be considered as incomplete actions but sendweceive as 
complete. The occurrence of send or receive alone in a system model is an error 
because it violates the assumption about strict synchronization made by the 
designer. 

The execution of a complete interaction by a component does not require 
synchronization with interactions of its environment. The execution of an in- 
complete interaction requires synchronization with some other interaction to 
produce a larger one which may be either complete or incomplete. Incomplete- 
ness of an interaction implies the obligation to synchronize when environment 
offers matching interactions as specified by the connectors. 

The distinction between complete and incomplete interactions encompasses 
many other distinctions such as output/input, internal/external, uncontrollable/ 
controllable used in different modeling formalisms. Clearly, internal actions of 
components should be considered as complete because they can be performed 
independently of the state of their environment. In some formalisms, output 
actions are complete (synchronous languages, asynchronous buffered communi- 
cation). In some others such as CSP [II] and Lotos [17], all synchronizing actions 
are incomplete. 

Often it is convenient to consider that the complete interactions of IC^ are 
defined from a given set of complete actions A+ C A. That is, ICA consists of 
all the interactions of IC where at least one complete action (element of A+) 
is involved. In the example of figure 3, we give sets of connectors and complete 
actions to define interaction models. By convention, bullets represent incom- 
plete actions and triangles complete actions. In the partially ordered set of the 
interactions, full nodes denote complete interactions. The interaction between 
put and get represented by the interaction put\get is a rendez-vous meaning that 
synchronization is blocking for both actions. The interaction between out and in 
is asymmetric as out can occur alone even if in is not possible. Nevertheless, the 
occurrence of in requires the occurrence of out. The interactions between out, ini 
and in 2 are asymmetric. The output out can occur alone or in synchronization 
with any of the inputs ini, ^ 2 - 



2.2 Incremental Description of Interaction Models 

Consider the interaction model IM = {IC, ICC ) of a set of interacting components 
K with disjoint action vocabularies Ai ior i & K . IC and/(7“ denote the sets 
of interactions and incomplete interactions, respectively on the vocabulary of 
actions A = Ai. 

Definition 3 (Glue connectors). Civen a set of disjoint subsets of K, 
Ki,..., Kn of K , we denote by C[Ki,...,Kn] the set of the connectors hav- 
ing at least one action in each set of components, that is, C[Ki, . . . , Kn] = {c = 
Cl U • • • U c„ I Vi G [1, n] . Ci G C[Ki] A c G C[K]}. 
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Fig. 3. Interaction models. 



Clearly, C[Ki, . . . , Kn] is the set of the connectors of IM[Ki U • • -UKn] which 
are not connectors of any IM[K'] for any subset K' of at most n — 1 elements 
from {Ki, . . . , Kn}. Notice that when the partition consists of only one set, then 
the above definition agrees with Definition 1 . 

Proposition 1. Given K\, K2, K3 three disjoint subsets of K. 

IC[Ki U K2] = IC[Ki] U IC[K2] U IC[Ki,K2] 

IC[Ki UK2,K3] = IC[Ki , K3] U IC[K2 , K3] U IC[K ^ , K2 , K3] 



Proof. The first equality comes from the fact that C'[iCi] U C[K2] U C[Ki,K2] 
contains all the connectors of C\Ki U K2] and other sets of actions that are not 
maximal. By definition, IC contains all the sub-sets of C. Thus, lC[Ki U K2] = 
I{C[Ki] U C[K2] U C[Ki,K 2]) = IC[Ki] U IC[K2] U IC[Ki,K2\. 

The second equality comes from the fact that C[Ki, K3]UC[K2, itla] UC'[iCi, 
K2,K3] contains all the connectors of C[Ki UK2, K3] and in addition, other sets 
of actions that are not maximal. By definition, IC contains all the sub-sets of C. 
Thus, IC[Ki U K2, = I{C[Ki,K^\ U C[K2, itla] U C[Ki,K2,Kf\) from which 

we get the result by distributivity of I over union. ■ 



Definition 4 (Union of incomplete interactions). Consider two sets of con- 
nectors Cl, C2 and the corresponding sets of interactions IC\, IC2. We take 
(/Cl)- U {IC2)- = (/Cl u IC2)-. 
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This definition combined with proposition 1 allows to compute the incomplete 
interactions of a system from the incomplete interactions of its components and 
thus provides a basis for incremental description of interaction models. 

Property 1. For Ki, K 2 , three disjoint subsets of K, 

IC[Ki UK2]- = IC[Ki]- U IC[K2]- U IC[Ki,K2 ]- 
IM[Ki U K2] = {IC[Ki U K2],IC\Ki U iFa]”) 

= IM\Ki] U IM[K2] U IM[Ki,K2] 

IM[Ki U K2,Ks] = IM[Ki,K3] U K3] U IM[Ki,K2, K3] 

By using this property, we get the following expansion formula: 

Proposition 2 (Expansion formula). 

IM[Ki UK2LI K3] =IM{Ki] U IM[K2] U IM[K3] U iFa] 

U 7M[iFi , X 3 ] U IM[K2 ,K3]U IM[Ki , 7^2 , Tfs] • 



2.3 Composition Semantics and Properties 

We need the following definitions. 

Definition 5 (Transition system). A transition system B is a tuple {X,IC, 
{G°‘}a(^iCi{F^}a(^ic), where 

— X is a finite set of variables; 

— IC is a finite set of interactions; 

— G°" is a predicate on X, the set of valuations of X, called guard; 

— : X X is a transition function. 

Definition 6 (Semantics of a transition system). A transition system {X, 
IC, {G°‘}a^ic, {.F“}oe/c) defines a transition relation — >•: X x 7Cx X as follows: 
Vx,x' G X Va G 7C. X 4 x' 4=^ G“(x) A x' = F“(x). 

Definition 7 (Constraint). A constraint on a transition system B = {X,IC, 
{G“}ae/C, {.F“}aG/c) « term of the form A where 

— is a state predicate on X ; 

— t°'{U°‘) is an action predicate such that Vx G X . t“(7/“)(x) if G“(x) 
7/“(x). 

That is, 7/^ A /\^g jpt“(7/“) holds at states which satisfy and from which 
only actions satisfying 77“ are executable. 

Definition 8 (Restriction). The restriction of a transition system B = {X, 
IC, {{G°')}aeic, {F°‘}aeic) with a constraint U = I\f\^^jfjt°‘{U°‘) is the tran- 

sition system B/U = {X, IC, {(G“)'}og/C, {F°‘}a^ic) where (G“)' = G“ A A 
[/“A77^([F“(x)/x]). 
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A system S is described by a term S = {B,IM)/U where IM = {IC,IC~) 
is an interaction model, is a transition system with set of interactions IC 
describing the behavior of its components, and f/ is a constraint. 

We define the behavior of 5" as the transition system B' /U where B' is the 
transition system obtained from B by removing all its incomplete interactions, 
that is B' = {X, IC, {G“}aG/C> , {F‘^}a^IC+), where IC^ is the set of the complete 
interactions of IM . 

As in the previous sub-section, we consider that S = {B,IM)/U is built 
from a set of interacting components K with disjoint action vocabularies Ai, i £ 
K and behaviors described by transition systems {Xi,Ai,{G°‘}a^Ai,{F°'}a^Ai) 
with disjoint sets of variables. 

We denote by S[K] the system built from the components k £ K, and assume 
that it has the layered structure S[K] = {B[K ] , IM[K])/U with interaction model 
IM\K] and constraint U. 

We define a composition operator || allowing to obtain for disjoint sub-sets 
Ki, K 2 of K, the system S[KiUK 2 ] as the composition of the sub-systems S[Ki], 
5'[AT2] for given interaction model IM[Ki, K 2 ] connecting the two sub-systems. 
The operator composes separately behaviors, interaction models and restrictions 
of the sub-systems. 




Fig. 4. The composition principle. 
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Definition 9 (||). The composition of two systems and is the sys- 

tem 

S[Ki[JK 2] ={B[Ki],IM[Ki])/Ui II {B[K2],IM[K2])/U2 

={B[K,] X B[K2],IM[Ki] U IM[K2] U IAd[KuK2])/{Ui A U2) 

where x is a binary associative behavior composition operator such that B[Ki\ x 
B[K2] = {Xi\J X2,lO[Ki\J K2],{G°‘}a^iC[KiuK2\AP°'}a^iC[KiUKPi) wherc for 
a = Oi I «2 G IC[Ki, K2], G“ = A G“^ and for any valuation (xi,X2) of 
X 1 UX 2 , i^“(xi,X2) = (f’“i(xi),F“^(x2)). 

Due to property 1 we have {B[Ki], IM[Ki])/Ui || {B[K2], IM[K2])/U2 = 
{B[Ki U K2],IM[Ki U K2])/{Ui A C/2), which means that composition of sub- 
systems gives the system corresponding to the union of their components. 
Notice that as x is an associative operator, composition is associative: 

{{B[K,],IM[Ki])/Ui\\{B[K2],IM[K2])/U2)\\{B[K3],IM[K:i])/U3 = 

= {B[KiUK2],IM[KiUK2])/{Ui A U 2 MB[K 3 f IM[K 3 ])/U 3 
= {B[Ki] X B[K2] X B[K3],IM[Ki U K2] U IMlKs] U IAi[Ki U K2, K3])/ 

{Ui A C/2 A C/3) 

= {B[K3 UK2U K3],IM[Ki UK2U K3])/{Ui a C/2 a C/3) 
by application of proposition 2 . Thus we have the proposition 
Proposition 3. || is a commutative and associative operator on systems. 

3 Deadlock Freedom by Construction 

3.1 Global Deadlock Freedom 

Definition 10 (Invariant). A constraint U = A on B = 

(X,IC,{G°‘}aeicAB°‘}aeic) is an invariant if \/a € IC G X. . A 

G“(x) C/^([F“(x)/x|) A C/“(x). 

The following properties are easy to prove and relate restriction to invariants. 
Property 2 . Given a transition system B and constraints U, Ui, C/2, 

— C/ is an invariant of B/U] 

— if Ui is an invariant of Bi, i = 1 , 2 , then U1AU2 is an invariant of {Bi x B2, IM) 
for any /M, and of {B\ x B2, IM) /U] 

— (S/C/i)/C/2 = B/(C/iAC/2); 

~ if C/ is an invariant of B then B/U is bisimilar to B from any state satisfying 

U. 
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As in the previous section, consider a system S = {xk^KBk, IM)/U 
built from a set of interacting components K where the transition sys- 
tems Bk = {Xk,Ak,{G‘^}a(^Ak,{B‘^}a€Ak) ^ave disjoint action vocab- 
ularies and sets of variables. We assume that IM = {IC,ICX), and 
B = (xkeKBk.lM) = (X,ICA, {G°‘}aaic+AF''}a(iic+)- In this section 
we study deadlock- freedom of S = B/U and of its components. 

Let B/U = (AT, 7(7^, {(G“)'}ag/(y+, {F“}ag/(y+) be the restriction of B by 
some constraint U = A ) with restricted guards (G“)' = G“ A 

A A G^([J^“(x)/x]) for any interaction a € IC^ . 

Definition 11 (Complete states). Given S = (B,IM), U = A 
t\a(^AF{U°'), and k some component of B, the set o/ complete states of k, 
that is, states of k from which its progression cannot he prevented by its envi- 
ronment, is characterized by the largest predicate complete(k) on such that 
complete(k) dlfk A\J aGic+ (G“)'. 

Definition 12 (Blocking states). For k G K, let dlfk be some non-empty 
invariant on implying deadlock-freedom of k, that is, dlfk ^ VasA*, We 
take dlf= f\k^K ^Vk and define for a component k the predicate 

blocking{k) = dlfk A - \/ (G“)' 

a€IC+ 

anA,.ji0 

characterizing the states where k is blocked due to interaction or restriction with 

U. 

Definition 13 (Dependency graph). Consider a system S = {B, IM)/U 
built from a set K of components. For each component k put each predicate 
hlocking{k) in the form \J Ci with Ci = f\k/^K B>k',i where Dk\i is a predicate 
depending on Xk' ■ The dependency graph of S is a labelled bipartite graph with 
two sets of nodes: the components of K, and constraint nodes {ci\3k G K . i G 
Ik}, where Ik is the set of conjunctive terms occurring in hlocking{k). For a 
component node k and a constraint node Ci, 

— there is an edge from k to Ci labeled with Dk^i if Dk^i yf false; and 

— there is an edge from Ci to k labeled with Dk^i if i G Ik- 

Notice that constraint nodes represent conditions under which a component 
is blocked. If c is a constraint node for a component k then it has at least one 
incoming and one outcoming edge. At some state, k is blocked if all the predicates 
labelling the incoming edges of c are true. 

Let 7 be a circuit of the dependency graph of a system {B,IM)/U. The 
predicate 

= A 



DL{j) 
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characterizes the system states for which all components in 7 may be blocked 
cyclically awaiting for each other. 

Theorem 1 (Deadlock freedom). A system {B,IM)/U is deadlock-free from 
any state satisfying dlfA if its dependency graph contains a non-empty sub- 
graph G such that 

~ if G contains a component node then G contains all its predecessor con- 
straint nodes, and if G contains a constraint node then G contains one of its 
predecessors; and 

— for any elementary circuit 7 of G, DL{j) = false. 

Proof. Assume that the system is at some global deadlock state x. As all the 
components are initialised at states satisfying dlf there exists for each component 
at least one action a having its guard G“ enabled at this state in B. 

Consider a component k of G and a term Cj of blocking{k) such that Ci(x). 
As dlfk{x), the label of the input edge (ci,k) is true at this state (the guards 
are contained in dlfk). Then consider some predecessor k' of Ci in G. The label 
of the edge {k',a) is true as it is a factor of c^. Again, the component k' has at 
least one guard enabled. Move backwards in G from this node by iterating the 
process. Then an elementary circuit 7 of G is found. By construction, all the 
predicates labelling the edges of this circuit are true at state x. This contradicts 
the assumption of DL{'f) = false. ■ 



Theorem 2 (Deadlock freedom). {B,IM)/U is deadlock-free /rom dlfAU^ 
if dlf A C/^^V„ 6 /c+(G“)'- 

Proof. Since for any k G K, dlfk is invariant by hypothesis, dlf and dlfA are 
by property 2 invariants of {B, IM)/U . As dlfA implies enabledness of some 
complete interaction, its invariance amounts to deadlock- freedom. ■ 



Theorem 3. The conditions of theorem 1 imply that dlfAU^ ^ aeic+ (G“)'. 

Proof (sketch). Suppose that there is some non-empty sub-graph G of the de- 
pendency graph as specified in theorem 1 such that for any elementary circuit 7 
of G, P>L{'-f) = false. Thus, for any valuation x such that dlffs) there is some com- 
ponent fc in G with ->blocking{k) (x) by construction of the dependency graph. 
By definition of blockinq, it follows that there is some interaction a such that 
(G“)'(x). 

The condition of theorem 2 allows an efficient check for deadlock freedom. 
However, in order to provide a diagnostic when this condition is not verified, it 
may be useful to construct a dependency graph for the states satisfying dlf A 
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3.2 Individual Deadlock-Freedom 

We give some results about deadlock-freedom preservation for transitions sys- 
tems. Similar results have been obtained for timed transition systems with pri- 
orities in [5]. 

In general, deadlock-freedom or even liveness of a system of interacting com- 
ponents do not imply that any component remains deadlock-free in the system. 
Guaranteeing this stronger property is the objective of the following definitions 
and theorem. 

Definition 14 (Run). A run of a transition system B = {X,IC,{G°‘}a<^ic, 
{F°‘'\a^ic) is an infinite sequence of interactions xq ^ xi ^ . . . x„ ^ . . . . 



Definition 15 (Individual deadlock- freedom). Given a system S, a com- 
ponent k € K is deadlock- free in S if for any run a of S and any prefix cr„ of 
(7, there exists a run a' such that Un<y' is a run of S, and some interaction of a' 
contains an interaction of k. 



Definition 16 (Controllable predecessors). Let B = (X, /C, {G“}ae/C; 
{F°‘'\a^ic) be the behavior of a component. For F C X, define prefY) C X 
such that X € pre(V) if 

— if X is complete then 3x' G X 3a G IC^ . X 4 x' A x' G Y; 

— if X is incomplete then Vx' G X Va G IC~ . x — >■ x' =4> x' G F, and such a 
and x' exist. 

For Xq C X we denote by PRE(fX.o) the least solution ofY = Xq Upre(F). 

Clearly, PRE(X.o) exists, as it is the fixed point of a monotonic functional. 
PRE{Xo) represents the set of the predecessors of Xq in the transition graph 
such that from anyone of its states a state of Xq can be reached by appro- 
priately choosing complete interactions. In this context, complete interactions 
can be characterized as controllable, as when they are enabled some interaction 
containing them can occur in the product system. On the contrary incomplete 
interactions are uncontrollable as their occurrence in the product depends on 
the state of the environment. Predicate transformers taking into account con- 
trollability have been studied in [13]. 

Definition 17 (Controllability). Given a system S, we call a component k G 
K with behavior (X^, Afc, {G“}aeAi, j controllable with respect to some 

state constraint U on Xk if PRE{U) = X^. 



Theorem 4 (Individual Deadlock- Freedom). Given a system S = {B,IM) 
built from a set K of components, a component k G K is deadlock-free from dlf 
in S if 
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— the dependency graph of S contains a sub-graph G satisfying the requirements 
of theorem 1 with k € K' , where K' is the set of component nodes in G, 

— any k G K' is controllable with respect to G“ for any a € Ak such that 
3a G IC 3a' € IC~ . a € a A a' C a, and 

— all n-ary interactions with n ^ 3 are rendez-vous, that is, for any interactions 
a € ICA with |a| ^ 3 and a' G IC, we have (1) ifaDa' ^ 0, then a' G IC~ , 
and (2) if a G af] Ai and a' G a' C\ Ai, then G“ A G“ = false for any i G K. 

Notice that under the hypotheses above, any component that is blocked, is 
either waiting for one (or more) binary interaction, or for exactly one n-ary 
interaction with n ^ 3. 

Proof. Consider some product state x in which k is blocked, and no complete 
interaction involving k is enabled. By theorem 1, some (direct or transitive) 

Oci Ct2 Ctfi — X—O:. 

predecessor of k can progress. Let ki„ — + fcjj — + . . . — + = fc be a chain 

of components in K' where ki — kj means that kj is in an incomplete state 
waiting for interaction a with ki, and such that only is able to progress. By 
controllability, ki^ can be led (by appropriately choosing some complete action, 
or by any incomplete action) towards a state x' where its action participating in 
ai is enabled. If ai is binary then this unblocks ki^; otherwise the interaction is 
by hypothesis a rendez-vous, and both components ki^ and ki^ remain blocked. 
In that case we apply the same reasoning to any chain of components blocking 
fcij. Finally, «2 will be enabled. The same reasoning can now be applied to 
and recursively descending the chain, until a becomes enabled. ■ 

4 Discussion 

This work is in progress and needs further validation by examples and case 
studies. It pursues similar objectives as the work by Th. Henzinger and his 
colleagues [6] . It lies within the scope of a lasting research program. 

The paper presents compositionality and composability results for deadlock- 
from of systems built from components following the proposed construction 
methodology. The concept of component is very general and can be applied to 
various types of descriptions, e.g. a block of code, hardware, provided they have 
disjoint state spaces and well defined interface and behavior. Interaction mod- 
els provide a powerful framework for synchronization, that encompasses both 
strict and non strict synchronization. Restriction appears to be a very useful 
concept for imposing global invariants. The layered description principle allows 
separation of concerns and has been used to some extent in [1]. Furthermore, lay- 
ered structuring is instrumental for the definition of an associative composition 
operation. 

The provided sufficient deadlock-freedom conditions require that components 
satisfy specific properties such as existence of non trivial deadlock-free invari- 
ants, and of sets of controllable predecessors. These properties can be checked 
algorithmically only when the components are finite-state. Deadlock- freedom 
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conditions also require the computation of the guards of the constructed system, 
which may be a source of exponential explosion for general interaction models 
and restrictions. 
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Abstract. A key idea in cryptography is using hard functions in order to obtain 
secure schemes. The theory of hard functions (e.g. one-way functions) has been a 
great success story, and the community has developed a fairly strong understanding 
of what types of cryptographic primitives can be achieved under which assumption. 
We explore the idea of using moderately hard functions in order to achieve many 
tasks for which a perfect solution is impossible, for instance, denial-of-service. 
We survey some of the applications of such functions and in particular describe 
the properties moderately hard functions need for fighting unsolicited electronic 
mail. We suggest several research directions and (re)call for the development of a 
theory of such functions. 



1 Introduction 

Cryptography deals with methods for protecting the privacy, integrity and functionality 
of computer and communication systems. One of the key ideas of Cryptography is using 
intractable problems, i.e. problems that cannot be solved effectively by any feasible 
machine, in order to construct secure protocols. There are very tight connections between 
Complexity Theory and Cryptography (which was called “a match made in heaven" [34]). 
Over the last 20 years the theory of hard functions (e.g. one-way functions) has been 
a great success story and we have a pretty good understanding of which tasks require 
which computational assumptions (see Goldreich’s book(s) [31] for a recent survey). 

However, as we will see later, many tasks, ranging from very concrete ones such 
as combatting spam mail to fairly abstract such as few-round zero-knowledge require a 
finer notion of intractability, which we call moderate hardness. In general one can say 
that while there are many applications where moderate intractability is needed, the study 
of moderate hardness has been neglected, compared with strict intractability. 

Today, the Internet makes cryptographic protection more necessary than ever. It also 
provides better opportunities than ever for performing distributed tasks. But many of 
these applications are either impossible without the use of moderately hard functions 
(e.g. fair contract signing without trusted third parties [12], see below) or prone to abuse 
(various denial of service attacks, free-loading etc.). 
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from the Israel Science Foundation. 
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We survey some of the applications where moderate hardness was used in an essential 
way in Section 3, starting with an emphasis on spam hghting in Section 2. In Section 4 
we outline future research direction in the area. 

2 Pricing via Processing or Combatting Junk Mail 

Unsolicited commercial e-mail, or spam, is a problem that requires no introduction, as 
most of us are reminded of it daily. It is more than just an annoyance: it incurs huge 
infrastructure costs (storage, network connectivity etc.). There are several (not mutually 
exclusive) approaches for combatting spam mail, but the one that interests us is the 
computational approach to fighting spam, and, more generally, to combatting denial of 
service attacks. This approach was initiated by Dwork and Naor [24]: 

“If I don’t know you and you want to send me a message, then you must prove 
you spent, say, ten seconds of CPU time, just for me and just for this message.” 

The “proof of effort” is cryptographic in flavor; as explained below, it is a moderately 
hard to compute (but very easy to check) function of the message, the recipient, and 
a few other parameters. The system would work automatically and in the background, 
so that the typical user experience is unchanged. Note that hltering and computational 
spam hghting are complementary, and the techniques reinforce each other. 

The Computational Approach: In order to send a message m, the sender is required to 
compute a tag 

z = f{m, sender, receiver, time) 

for a moderately hard to compute “pricing” function /. The message m is transmitted 
together with z, the result of the computation. Software operating on behalf of the 
receiver checks that z is properly computed prior to making m visible to the receiver. 
It is helpful to think of the tag z as a proof of computational effort. The function / is 
chosen so that: 

1. The function / is not amenable to amortization or mass production; in particular, 
computing the value f{m, sender, AWce, time) does not help in computing the 
value f{m, sender. Bob, time). This is key to hghting spam: the function must be 
recomputed for each recipient (and for any other change of parameters, including 
time). 

2. There is a “hardness” parameter to vary the cost of computing /, allowing it to 
grow as necessary to accommodate the increased computing power, as predicted by 
Moore’s Law. 

3. There is an important difference in the costs of computing / and of checking /: 
the cost of sending a message should grow much more quickly as a function of the 
hardness parameter than the cost of checking that a tag has been correctly computed. 
For the functions described in [24] and [5], the cost of computing the tag grows 
exponentially in the hardness parameter; the cost of checking a tag grows only 
linearly. This difference allows the technique to accommodate great increases in 
sender processor speeds, with negligible additional burden on the receiver. 
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We want the cost of checking a computation to be very low so that the ability to wage 
a denial of service attack against a receiver is essentially unchanged. In other words, 
we don’t want the spam-hghting tools to aid in denial of service attacks. In fact, the 
pricing via processing approach has been suggested also as a mechanism for preventing 
distributed denial of service attacks [36] and for metering access to web pages [30] 

Dwork and Naor [24] called such a function a pricing function because the proposal 
is fundamentally an economic one: machines that currently send hundreds of thousands 
of spam messages each day, could, at the 10-second price, send only eight thousand. 

2.1 Memory-Bound Functions 

The pricing functions proposed in [24] and other works such as [5] are all CPU-bound. 
The CPU-bound approach might suffer from a possible mismatch in processing among 
different types of machines (PDAs versus servers), and in particular between old ma- 
chines and the (presumed new, top of the line) machines that could be used by a high-tech 
spam service. A variant on the pricing via processing approach outlined above is to ex- 
ploit the fact that memory with small latency (caches) is small in computers. In order 
to address these disparities, Abadi et al [1] proposed an alternative computational ap- 
proach based on memory latency: design a pricing function requiring a moderately large 
number of scattered memory accesses. The ratios of memory latencies of machines built 
in the last five years is less than four; a comparable number for desktop CPU speeds is 
over 10. The CPU speed gap is even greater if one includes low-power devices as PDAs, 
which are also used for e-mail. Thus, functions whose cost is principally due to memory 
latency may be more equitable as pricing functions than CPU-bound functions. 

This raises an interesting question: which functions require many random accesses 
to a large memory? More specifically, can we suggest functions satisfying the above 
requirements from a pricing function and where the memory access dominates CPU 
work for all “reasonable" settings. Dwork, Goldberg and Naor [23] have explored this 
direction and proposed functions based on random walks in a large shared random table. 
For an abstract version of their proposal they managed to give a tight lower bound on 
the amortized number of memory access when computing an acceptable proof of effort, 
based on idealized hash functions (random oracles). They suggested a very efficient 
concrete implementation of the abstract function, inspired by the RC4 stream cipher. 

In addition to their application for spam fighting, the memory-bound functions of 
[23] have been proposed to prevent Sybil Attacks [44], where an adversary pretends to 
be many different processors (which is very relevant in peer-to-peer applications [21]). 



3 Other Applications of Moderate Hardness 

We now list a few other areas where applying moderately hard functions proved useful. 
As one can see the range is quite large. 

Time-Locks: Is it possible to “ice-lock" a piece of information so that it will only be 
available in the far future but not in the near future? This is one of the more intuitive 
applications of moderately hard functions, i.e. the data is encrypted with a weak key 
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so that finding it must take time. Several researchers have considered this case, since 
the early days of Cryptography. However most of them ignored the issue of parallel 
attacks, i.e. simply taking a DES key and revealing a few bits is not a real time-lock 
since the adversary may utilize a parallel machine for the exhaustive search, thus 
speeding up the lock significantly. The first scheme taking into account the parallel 
power of the attacker is the one by Rivest, Shamir and Wagner [42]. They suggested 
using the “power function", i.e. computing f{x) = (mod N) where is a 

product of two large primes'. Without knowing the factorization of N the best that 
is known is repeated squaring - a very sequential computation in nature. The scheme 
has been used to create a time capsule called LCS35 at MIT [41]. In general the 
power function is very useful and was used to create many other construction (see 
below). 

Timed Commitments: A string commitment protocol allows a sender to commit, to 
a receiver, to some value. The protocol has two phases. At the end of the commit 
phase the receiver has gained no information about the committed value, while 
after the reveal phase the receiver is assured that the revealed value is indeed the 
one to which the sender originally committed. Timed commitments, defined and 
constructed by Boneh and Naor [12], are an extension of the standard notion of 
commitments in which there is a potential forced opening phase permitting the 
receiver, by computation of some moderately hard function, to recover the committed 
value without the help of the committer. Furthermore, the future recoverability of 
the committed value is verifiable: if the commit phase ends successfully, then the 
receiver is correctly convinced that forced opening will yield the value. In [12] such 
commitments based on the power function were constructed. 

Fair Contract Signing: An important application as well as motivation for the timed- 
commitment of [12] was fair contract signing: two mutually suspicious parties wish 
to exchange signatures on a contract. How can one party assure that once it has 
provided its signature on the contract it will also receive the other parties signature 
(on the same or other message). The problem of fair contract signing was the im- 
petus of much of the early research on cryptographic protocols. Protocols where no 
third party (judge, referee) is involved are based on the gradual release of signa- 
tures/secrets. Examples include [10,28,17,20,38,12]. In such cases it can be shown 
that timing considerations are essential for fairness. 

Collective Coin Flipping: Another application of timed commitments is to the problem 
of collective coin flipping. Suppose that two parties A and B wish to flip a coin in 
such a matter that (i) the value of the coin is unbiased and well defined even if one of 
the parties does not follow the protocol (if both of them don’t follow, then it is a lost 
case) (ii) if both parties follow the protocol, then they agree on the same value for 
the coin. The results of Cleve and Impagliazzo [16,18] imply that with out timing 
consideration it is impossible to achieve this goal. One the other hand, in [12] timed 
commitments are used to construct such protocols, by making weak assumption on 
the clocks of the two parties. 

Key Escrow: Suppose that a central authority wishes to escrow the keys of the users in 
order to facilitate wiretapping. This is of course very controversial and one would like 

* This is related to the random access property of the Blum-Blum-Shub generator [11]. 
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to limit the power this central authority has. Bellare and Goldwasser [7,8] suggested 
“time capsules" for key escrowing in order to deter widespread wiretapping. A major 
issue there is to verify at escrow-time that the right key is escrowed. Similar issues 
arise in the timed commitment work, where the receiver should make sure that at 
the end of the commit phase the value is recoverable. Other applications include 
Goldschlag and Stubblebine [33] and Syverson [45]. 

Benchmarking: Cai et al [13] were interested in coming up with uncheatable bench- 
marks, i.e. methods that allow a system owner to demonstrate the computational 
power of its system. They suggested using the power function for these purposes. 
Note that here no measures are taken or needed in order to ensure the correctness 
of the values. 

Zero-knowledge Protocols: Using moderately hard functions enables us to construct 
zero-knowledge protocols in various settings where either it is not known how to do it 
without timing considerations or even provably impossible. This includes concurrent 
zero-knowledge [26,37,43, 15], Resettable Zero-Knowledge [14,25] and three-round 
zero-knowledge [32,25]. Furthermore Dwork and Stockmeyer [27] investigated the 
possibility of constructing nontrivial zero-knowledge interactive proofs under the 
assumption that the prover is computationally bounded during the execution of the 
protocol. 



4 Open Problems in Moderately Hard Functions 

We view the focus on moderate hardness as an adjustment of the theoretical models 
of computation to the constraints - and the benefits - of the real world. In the real 
world, we have clocks, processors have cycle speeds and memory latencies behave in 
certain ways. Why should the theory ignore them? We would like to see a comprehensive 
theory of moderately hard functions. Ideally, given such a theory we would be able to 
characterize and know for most tasks (both those described above and unforseen ones) 
the assumptions needed, and offer good schemes to obtain them. 

There are many interesting research questions that are largely unexplored in this area. 
Some of the questions mirror the hard functions world and some are unique to this area. 
Also some questions are relevant to all or most applications described above, whereas 
others are more specialized. We now propose several research directions. This list is the 
outcome of joint work with Dan Boneh and Cynthia Dwork. 

Unifying Assumption: In the intractable world we have the notion of a one-way func- 
tion which is both necessary for almost any interesting task^ and sufficient for many 
of them. Is there such a unifying assumption in the moderately hard world? Alter- 
natively, is there evidence for a lack of such assumption (e.g. in the style of oracle 
separation, a la Impagliazzo and Rudich [35]). 

Precise Modelling: Unlike the tractable vs. Intractable case the exact machine model 
(Turing Machines, RAMS) is not very signihcant, here given that exact time estimate 
may be important it may matter. For each application we have in mind we have to 

^ Threshold secret sharing and encryption using a very long shared key (one-time pad) are an 
exception. 
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describe the exact power of the adversary as well as a model of computation. Such 
modelling was carried out in the work on memory-bound functions [23]. Given the 
diversity of applications of moderately hard functions the issue of a unifying model 
comes up. 

Hardness Amplification: Suppose that we have a problem that is somewhat hard and 
we would like to come up with one that is harder. How should we proceed. For 
example, can we amplify a problem by iterating it on itself? 

Hardness vs. Randomness: A question that is relevant to some applications is the con- 
nection between (moderate) hardness and (moderate) pseudorandomness. This is 
particularly important for applications such as timed commitment, where the pro- 
tection to the data has to ensure (temporary) indistinguishability from randomness. 
Note that following the existing reductions of the intractable world, going from 
one-way functions (or, for simplicity, permutations) to pseudo-random generators, 
is too heavy given the iterative nature of the construction. 

Evidence for non-amortization: Suppose that there is an adversary that attempts to 
solve many problems simultaneously and manages to obtain a marginal work fac- 
tor which is smaller than that of individual instances. For some applications, such 
as timed commitment, this is not significant, but for the pricing-via-processing ap- 
proach this is devastating. Therefore the question is what evidence do we have for 
the infeasibility of mass production. For instance, is it possible to demonstrate that 
if a certain problem is not resilient to amortization, then a single instance of it can 
be solved much more quickly? 

The possibility of assumption free memory-bound functions: For the memory- 
bound functions an intriguing possibility is to apply recent results from complexity 
theory in order to be able to make unconditional (not relying on any assumptions) 
statements about proposed schemes. One of the more promising directions in recent 
years is the work on lower bounds for branching program and the RAM model by 
Ajtai [3,4] and Beame et al [6]. It is not clear how to directly apply such results. 

Immunity to Parallel Attacks: For timed commitment and time-locking, it is impor- 
tant to come up with moderately hard functions that are immune to parallel attacks, 
i.e., solving the challenge when there are many processors available takes essentially 
the same time as when a single one is at hand. In [42, 12] and other works the power 
function was used, but there is no good argument to show immunity against parallel 
attacks. An intriguing possibility is to reduce worst-case to average case, i.e., find 
a random self reduction. While in the intractable world it is known that there are 
many limitations on random self reductions (see [29]), in this setting it is not clear 
that one cannot use them to demonstrate average-case hardness. In particular, is it 
possible to randomly reduce a P-Complete problem to itself. Such a reduction would 
yield a distribution on problems that at least has some inherent sequentiality. More 
specifically is it possible to use linear programming or lattice basis reduction for 
such purposes? 

New Candidates for Moderately Hard Functions: We need more proposals for mod- 
erately hard functions with various conjectured (or provable) properties, for instance 
on the amount of memory required to compute them. Areas such as Algebraic Ge- 
ometry, Lattices and Linear Programming are promising. 
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Abstract. Expansion of graphs can be given equivalent definitions in 
combinatorial and algebraic terms. This is the most basic connection 
between combinatorics and algebra illuminated by expanders and the 
quest to construct them. The talk will survey how fertile this connection 
has been to both fields, focusing on recent results. In particular, I will 
explain the zigzag product, and how it enables better constructions and 
new applications. 
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